CN110751267B

CN110751267B - Neural network structure searching method, training method, device and storage medium

Info

Publication number: CN110751267B
Application number: CN201910943886.6A
Authority: CN
Inventors: 李婷; 张钧波; 宋礼; 郑宇�
Original assignee: Jingdong City Beijing Digital Technology Co Ltd
Current assignee: Jingdong City Beijing Digital Technology Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2021-03-30
Anticipated expiration: 2039-09-30
Also published as: CN110751267A

Abstract

The embodiment of the invention provides a structure searching method, a training method, a device and a storage medium of a neural network. The structure searching method of the neural network comprises the following steps: based on a preset network framework, linking each intermediate network layer in at least one intermediate network layer by using a structure search variable and a structure search complete set to obtain the output of a neural network to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer; determining a first loss function based on the output of the neural network to be determined and a training set; determining a second loss function based on the output of the neural network to be determined and a validation set; updating model parameters of the neural network to be determined based on a first loss function; updating the structural parameters of the neural network to be determined based on a second loss function; and determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters.

Description

Neural network structure searching method, training method, device and storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a structure searching method, a training method, a device and a storage medium of a neural network.

Background

Machine Learning (ML) is a branch of artificial intelligence, and the purpose of Machine Learning is to make a Machine learn according to a priori knowledge, so that the Machine has logical ability to classify and judge. Machine learning models, represented by neural networks, are constantly evolving and are increasingly being used in various industries. However, in the related art, the design of the neural network needs to be performed in combination with expert experience and a large number of parameter debugging experiments, and the whole design process needs to be completed manually, which is time-consuming and labor-consuming.

Disclosure of Invention

In order to solve the existing technical problems, embodiments of the present invention provide a neural network structure search method, a neural network structure training device, and a storage medium, which can implement automatic search of a neural network structure, save time, and improve efficiency.

The embodiment of the invention provides a structure searching method of a neural network, which comprises the following steps:

based on a preset network architecture, linking each intermediate network layer in at least one intermediate network layer by using a structure search variable and a structure search complete set to obtain the output of a neural network to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer;

determining a first loss function based on the output of the neural network to be determined and a training set; determining a second loss function based on the output of the neural network to be determined and a validation set;

updating model parameters of the neural network to be determined based on the first loss function; updating structural parameters of the neural network to be determined based on the second loss function;

and determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters.

In the above scheme, the linking each intermediate network layer in at least one intermediate network layer by using a structure search variable and a structure search corpus based on a preset network architecture to obtain an output of a neural network to be determined includes:

obtaining the output of the hybrid convolution operation of the current intermediate network layer by utilizing a structure controller and a model parameter controller; the hybrid convolution operation includes a plurality of basic convolution operations; the structure controller searches the weight value of each candidate quantity included in the variable for the structure; the model parameter controller is a weighted value corresponding to a convolution kernel of each basic convolution operation;

obtaining, by the fabric controller, an output of the hybrid connection operation of the current intermediate network layer;

obtaining the output of the current intermediate network layer by using the output of the hybrid convolution operation and the output of the hybrid connection operation;

and obtaining the output of the neural network to be determined by utilizing the output of the current intermediate network layer.

In the foregoing solution, the obtaining an output of a hybrid convolution operation of a current intermediate network layer by using a structure controller and a model parameter controller includes:

summing products of each basic convolution operation and the weighted value corresponding to each basic convolution operation to obtain the output of the mixed convolution operation of the current intermediate network layer; the structure controller comprises a weight value corresponding to each basic convolution operation; the basic convolution operation is obtained using the model parameter controller and the output of an intermediate network layer previous to the current intermediate network layer.

In the foregoing solution, the obtaining, by using the structure controller, the output of the hybrid connection operation of the current intermediate network layer includes:

summing products of each basic connection operation and the weighted value corresponding to each basic connection operation to obtain the output of the mixed convolution operation of the current intermediate network layer; the hybrid join operation comprises a plurality of basic join operations; the fabric controller includes a weight value corresponding to each basic connection operation.

In the above solution, the updating the model parameter of the neural network to be determined based on the first loss function includes:

updating the model parameters of the neural network to be determined by using a gradient descent method based on the first loss function;

the updating of the structural parameters of the neural network to be determined based on the second loss function comprises:

and updating the structural parameters of the neural network to be determined by using a gradient descent method based on the second loss function.

In the above scheme, the determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters includes:

circularly updating the model parameters and the structural parameters until the first loss function and the second loss function are converged;

and determining the structure of the neural network to be determined based on the model parameters and the structure parameters when the first loss function and the second loss function are converged.

An embodiment of the present invention further provides a structure search apparatus for a neural network, including:

the first determining unit is used for linking each intermediate network layer in at least one intermediate network layer by using a structure search variable and a structure search complete set based on a preset network architecture to obtain the output of a neural network to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer;

a second determining unit, configured to determine a first loss function based on an output of the neural network to be determined and a training set; determining a second loss function based on the output of the neural network to be determined and a validation set;

an updating unit, configured to update a model parameter of the neural network to be determined based on the first loss function; updating structural parameters of the neural network to be determined based on the second loss function;

and the third determining unit is used for determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters.

The embodiment of the invention also provides a training method of the neural network, which comprises the following steps:

acquiring an input data set;

inputting the input data set into a neural network to be trained to obtain a predicted value of the neural network to be trained;

determining a loss function of the neural network to be trained based on the predicted value;

updating model parameters of the neural network to be trained based on the loss function;

the neural network to be trained is obtained by searching based on the neural network structure searching method provided by the embodiment of the invention.

An embodiment of the present invention further provides a training apparatus for a neural network, including:

an acquisition unit for acquiring an input data set;

the prediction unit is used for inputting the input data set into a neural network to be trained to obtain a prediction value of the neural network to be trained;

the determining unit is used for determining a loss function of the neural network to be trained based on the predicted value;

the updating unit is used for updating the model parameters of the neural network to be trained based on the loss function;

An embodiment of the present invention further provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the structure searching method of the neural network provided by the embodiment of the invention or realizing the training method of the neural network provided by the embodiment of the invention when the executable instructions stored in the memory are executed.

The embodiment of the present invention further provides a storage medium, where the storage medium stores executable instructions, and when the executable instructions are executed by at least one processor, the method for searching a structure of a neural network provided in the embodiment of the present invention is implemented, or the method for training a neural network provided in the embodiment of the present invention is implemented.

The embodiment of the invention provides a structure searching method, a training method, a device and a storage medium of a neural network. The structure searching method of the neural network comprises the following steps: based on a preset network architecture, linking each intermediate network layer in at least one intermediate network layer by using a structure search variable and a structure search complete set to obtain the output of a neural network to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer; determining a first loss function based on the output of the neural network to be determined and a training set; determining a second loss function based on the output of the neural network to be determined and a validation set; updating model parameters of the neural network to be determined based on the first loss function; updating structural parameters of the neural network to be determined based on the second loss function; and determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters. In the embodiment of the invention, the structure search complete set is linked to each intermediate network layer of the neural network to be determined so as to obtain the output which can embody all the selectable network structure schemes of the neural network to be determined, then the output and the training set are utilized to update the model parameters of the neural network to be determined, the output and the verification set are utilized to update the model parameter structure parameters of the neural network to be determined until the optimal model parameters and structure parameters are found, and finally the optimal model parameters and structure parameters are utilized to obtain the structure of the neural network to be determined. Therefore, the automatic searching of the neural network structure can be realized, the time is saved, and the efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of a traffic flow prediction model space-time residual error network in the related art;

fig. 2 is a schematic flow chart illustrating an implementation of a structure searching method of a neural network according to an embodiment of the present invention;

FIG. 3 is a first schematic diagram of a default network architecture of a neural network to be determined according to an embodiment of the present invention;

FIG. 4 is a second schematic diagram of a default network architecture of a neural network to be determined according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a hybrid convolution operation according to an embodiment of the present invention;

FIG. 6 is a schematic illustration of a hybrid connection operation according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a traffic flow prediction model according to an embodiment of the present invention;

fig. 8 is a schematic diagram illustrating an implementation of a structure search method of a traffic flow prediction model according to an embodiment of the present invention;

FIG. 9a is a schematic diagram of a network structure NAS-C-Net aiming at proximity characteristics obtained by the structure searching method according to the embodiment of the present invention;

FIG. 9b is a schematic diagram of a network structure NAS-P-Net aiming at periodic characteristics obtained by using the structure searching method according to the embodiment of the present invention;

FIG. 9c is a schematic diagram of a network structure NAS-T-Net for trend characteristics obtained by the structure search method according to the embodiment of the present invention;

fig. 10a is a schematic diagram of a network structure for predicting taxi traffic in beijing city according to the structure search method in the embodiment of the present invention;

fig. 10b is a schematic diagram of a network structure for predicting the pedestrian traffic of a host city, which is obtained by using the structure search method according to the embodiment of the present invention;

fig. 11 is a diagram illustrating a structure of a structure searching apparatus of a neural network according to an embodiment of the present invention;

fig. 12 is a schematic flow chart illustrating an implementation of a neural network training method according to an embodiment of the present invention;

FIG. 13 is a diagram illustrating the structure of a training apparatus for neural networks according to an embodiment of the present invention;

fig. 14 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention;

FIG. 15a is a diagram showing the comparison of the RMS error of the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert designed network structure ST-ResNet for different convolution filters;

FIG. 15b is a graph showing the comparison of the mean absolute error of the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert designed network structure ST-ResNet for different convolution filters;

FIG. 15c is a graph showing the comparison of the mean absolute percentage error of the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert designed network structure ST-ResNet for different convolution filters;

FIG. 15d is a diagram showing the comparison result of the root mean square error of the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert-designed network structure ST-ResNet for different learning rates;

FIG. 15e is a diagram showing the comparison result of the mean absolute error between the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert-designed network structure ST-ResNet for different learning rates;

FIG. 15f is a diagram showing the comparison result of the mean absolute percentage error of the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert-designed network structure ST-ResNet for different learning rates;

FIG. 15g is a diagram showing the comparison of the root mean square error between the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert-designed network structure ST-ResNet for different test set sizes;

FIG. 15h is a diagram showing the comparison of the mean absolute error between the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert-designed network structure ST-ResNet for different test set sizes;

FIG. 15i is a diagram showing the comparison of the mean absolute percentage error between the model ST-NASNET obtained by the structure search method according to the embodiment of the present invention and the expert-designed network structure ST-ResNet for different test set sizes.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

The structure of the neural network in the related art generally depends on the manual design of related professionals, and the final determination of the neural network generally needs a large amount of experiments to debug the parameters of the related model, so that the neural network with a relatively fixed network structure is obtained after the parameter debugging is completed. However, under the condition that the network structure of the artificially designed neural network is relatively fixed, the neural networks cannot learn the characteristics of different aspects represented by the same input data set in a targeted manner, and cannot automatically make adaptive adjustment on the network structure of the neural networks according to the characteristics of different input data sets.

For example, a traffic flow prediction model in the related art is a space-time Residual Network (ST-ResNet), and a main architecture diagram of the ST-ResNet is shown in fig. 1. The ST-ResNet can simultaneously consider the characteristics of three aspects of time proximity, periodicity and tendency, and also consider additional influencing factors such as weather, holidays, events and the like. The ST-ResNet utilizes three residual error networks to respectively extract the characteristics of three aspects of proximity, periodicity and tendency; and extracting the characteristics of the additional influence factors by using a full-connection network, and then fusing the characteristics of the four aspects to finally realize the prediction of the traffic flow. However, the proximity, periodicity and trend are used for describing the traffic flow from different aspects, and ST-ResNet adopts the same network structure and has insufficient characterization capability on the characteristics of different aspects; meanwhile, different input data sets have different data distribution and rules, for example, the data distribution of the pedestrian flow of Beijing and the pedestrian flow of Xinjiang are different, and the sparsity is different, at the moment, the same network structure is used for different input data sets, which may cause insufficient learning or overfitting.

Based on this, in each embodiment of the present invention, the structure search corpus is linked to each intermediate network layer of the neural network to be determined, so as to obtain an output that can represent all selectable network structure schemes of the neural network to be determined, then the output and the training set are used to update the model parameters of the neural network to be determined, the output and the verification set are used to update the model parameter structure parameters of the neural network to be determined until the optimal model parameters and structure parameters are found, and finally the optimal model parameters and structure parameters are used to obtain the structure of the neural network to be determined. Furthermore, the neural networks obtained by searching can be used for learning the characteristics of different aspects represented by the same input data set in a targeted manner, and can also be used for automatically carrying out adaptive adjustment on the network structure of the neural networks according to the characteristics of different input data sets.

The embodiment of the invention provides a structure searching method of a neural network, and fig. 2 is a schematic flow chart illustrating the implementation of the structure searching method of the neural network according to the embodiment of the invention. As shown in fig. 2, the method comprises the steps of:

step 201: based on a preset network architecture, linking each intermediate network layer in at least one intermediate network layer by using a structure search variable and a structure search complete set to obtain the output of a neural network to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer;

step 202: determining a first loss function based on the output of the neural network to be determined and a training set; determining a second loss function based on the output of the neural network to be determined and a validation set;

step 203: updating model parameters of the neural network to be determined based on the first loss function; updating structural parameters of the neural network to be determined based on the second loss function;

step 204: and determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters.

In step 201, the preset network architecture is a preset network architecture of the neural network to be determined.

Consider a general neural network architecture comprising: an input network layer, an intermediate network layer and an output network layer; wherein the intermediate network layers comprise at least one intermediate network layer; and the current intermediate network layer is linked with the previous network layer of the current intermediate network layer through convolution operation (convolution operation); the current intermediate network layer is linked with each layer of network layer before the current intermediate network layer through a skip connection operation. Here, the current intermediate network layer is one of the plurality of intermediate network layers that is being processed when performing the structure search. Each layer network layer preceding the current intermediate network layer may include an input network layer and an intermediate network layer preceding the current intermediate network layer.

In this embodiment of the present application, the preset network architectures of the neural networks to be determined each include: an input network layer, an intermediate network layer and an output network layer; wherein the intermediate network layers comprise at least one intermediate network layer; and the current intermediate network layer is linked with the previous network layer of the current intermediate network layer through mix convolution operation (mix convolution operation); the current intermediate network layer is linked with each layer of network layer before the current intermediate network layer through mix skip connection operation.

It should be noted that: here, the convolution operation may be understood as a fixed basic convolution operation, and the join operation may be understood as a fixed basic convolution operation; a hybrid convolution operation may be understood as an operation comprising a plurality of basic convolution operation possibilities and a hybrid join operation may be understood as an operation comprising a plurality of basic join means possibilities.

Fig. 3 is a schematic diagram of a preset network architecture of a neural network to be determined, the network architecture including: an input network layer, an intermediate network layer and an output network layer; wherein the intermediate network layer comprises four intermediate network layers. In practical applications, the number of intermediate network layers included in the neural network is referred to as the network layer number. The inputs in FIG. 3 represent the input network layers; 1, 2, 3, 4 in fig. 3 represent four intermediate network layers, respectively, namely: a first intermediate network layer, a second intermediate network layer, a third intermediate network layer, a fourth intermediate network layer; the output in fig. 3 represents the output network layer. The current intermediate network layer of the four intermediate network layers is linked with the previous network layer of the current intermediate network layer through hybrid convolution operation; and the current intermediate network layer is linked with each layer of network layer before the current intermediate network layer through hybrid connection operation. For example, when the third intermediate network layer in fig. 3 is used as the current intermediate network layer in the structure search, the third intermediate network layer is linked to the second intermediate network layer through a hybrid convolution operation, and the third intermediate network layer is linked to the input network layer, the first intermediate network layer, and the second intermediate network layer through a hybrid connection operation.

It should be noted that: the network layer number is a hyper-parameter and does not belong to the structure search variable of the neural network. In practical application, the number of network layers can be adjusted according to practical situations. In the network architecture of the embodiment of the present application, the number of network layers is four for example, but the number of network layers is not limited to the number of network layers in the network architecture of the present invention. Here, the hyper-parameter is a parameter of which a value is set before the learning process is started, not parameter data obtained by training.

Here, the structure search variable refers to a network structure parameter that the neural network to be determined needs to be determined by using a structure search method. From the network architecture of fig. 3, it can be seen that: determining two structure search variables, namely a first structure search variable and a second structure search variable, aiming at each intermediate network layer; wherein the first structure search variable is specifically: the current intermediate network layer in the plurality of intermediate network layers is linked with the previous network layer of the current intermediate network layer through what kind of basic convolution operation; the second structure search variable is specifically: the method specifically comprises how to perform connection operation on the current intermediate network layer and each network layer before the current intermediate network layer.

After determining the structure search variables, it is necessary to further determine a structure search corpus of the structure search variables, where the structure search corpus refers to a set of all candidate quantities included in the structure search variables. The candidate amounts may be specifically understood as: for the first structure search variable, the candidate quantity is all basic convolution operations contained in the mixed convolution operation; for the second structure search variable, the candidate quantities are all basic join operations contained in the hybrid join operation.

In practical applications, specific types of convolution operations include: standard convolution, separation convolution, hole convolution, etc.; meanwhile, the sizes of the convolution operations specifically include: 1 × 1, 3 × 3, 5 × 5, 7 × 7, etc., and the type and size of the convolution operation may be selected according to actual situations in practical applications.

In some embodiments, the hybrid convolution operation includes six basic convolution operations: standard convolution 3 × 3(standard convolution 3 × 3), standard convolution5 × 5(standard convolution5 × 5), separation convolution 3 × 3 (partial convolution 3 × 3), separation convolution5 × 5 (partial convolution5 × 5), hole convolution 3 × 3 (displacement convolution 3 × 3), and hole convolution5 × 5 (displacement convolution5 × 5). Here, the six basic convolution operations constitute a structural search corpus of first structural search variables.

In some embodiments, the hybrid join operation includes two basic join operations: connected, not connected. Here, the two basic convolution operations constitute a structural search corpus of second structural search variables.

Here, the output of the neural network to be determined is a weighted summation representation of all possible link modes of each intermediate network layer in the neural network to be determined.

In some embodiments, the linking, based on the preset network architecture, each of the at least one intermediate network layer by using a structure search variable and a structure search complete set to obtain the output of the to-be-determined neural network includes:

step a 1: obtaining the output of the hybrid convolution operation of the current intermediate network layer by utilizing a structure controller and a model parameter controller; the hybrid convolution operation includes a plurality of basic convolution operations; the structure controller searches the weight value of each candidate quantity included in the variable for the structure; the model parameter controller is a weighted value corresponding to a convolution kernel of each basic convolution operation;

step b 1: obtaining, by the fabric controller, an output of the hybrid connection operation of the current intermediate network layer;

step c 1: obtaining the output of the current intermediate network layer by using the output of the hybrid convolution operation and the output of the hybrid connection operation;

step d 1: and obtaining the output of the neural network to be determined by utilizing the output of the current intermediate network layer.

Fig. 4 is a schematic diagram of the neural network default network architecture to be determined, and fig. 4 and fig. 3 are different expressions of an exemplary default network architecture. In fig. 4, the intermediate network layers have L (L is a positive integer equal to or greater than 1) layers in common, and if the current intermediate network layer is referred to as the L-th intermediate network layer, L is 1, 2, …, L.

In step a1, the model parameter controller is configured to apply a weight value to the convolution kernel of each basic convolution operation included in the hybrid convolution operation. The following description will be given taking as an example that the hybrid convolution operation includes six basic convolution operations.

In practical application, the model parameter controller w_modelThe weight values corresponding to the convolution kernels comprising six elementary convolution operations, i.e.

Wherein the content of the first and second substances,

representing the weight value of the convolution kernel of a standard convolution 3 x 3,

representing the weight value of the convolution kernel of a standard convolution of 5 x 5,

representing the weight values of the convolution kernel separating the convolutions by 3 x 3,

representing the weight values of the convolution kernel separating the convolutions by 5 x 5,

represents the weight value of the convolution kernel of the hole convolution 3 x 3,

convolution representing a void convolution of 5 x 5The weight value of the core. After the structure search is completed, the weight value in the model parameter controller is used for representing the importance degree of each convolution kernel in the corresponding basic convolution operation, and the greater the weight value is, the more important the corresponding convolution kernel in the basic convolution operation corresponding to the weight value is.

The structure controller is used for characterizing the weight value of each candidate quantity included by the structure search variable of the neural network to be determined. The following description will be given taking as an example that the hybrid convolution operation includes six basic convolution operations, and the hybrid join operation includes two basic join operations.

In practical application, the controller w is constructed_arcMay include a volume controller (volume controller) w_pAnd a skip connection controller (skip connection controller) w_STwo structural controllers, i.e. w_arc＝{w_p,w_S}. Wherein the volume controller w_pThe weighted values corresponding to the six basic convolution operations are used for controlling the specific link between the current middle network layer and the previous network layer of the current middle network layer in the plurality of network layers through which basic convolution operation is performed after the structure search is completed, that is, the weighted values are corresponding to the six basic convolution operations, that is, the weighted values are used for controlling the specific link between the current middle network layer and the previous network layer of the current middle network layer

Wherein: l_pWhere L is the number of network layers, n_pFor all the basic convolution operations involved in the hybrid convolution operation, i.e. n_p6; connection controller w_SThe weighted values corresponding to the two basic connection operations are used for controlling how the current intermediate network layer is connected with all the previous network layers of the current intermediate network layer after the structure search is completed, namely

Wherein:

the number of possible connections between network layers, where L is the number of network layers and n_sFor all basic connections involved in the operation of hybrid connectionNumber of types of modes, i.e. n_s2. Here, when the structure search is performed, the convolution controller w_pAnd a connection controller w_SAll the possible concatenations, i.e. the hybrid convolution operation and the hybrid concatenation operation, need to be performed.

It should be noted that: number n of kinds of all basic convolution operations included in the hybrid convolution operation_pAnd the number n of kinds of all basic connection modes included in the hybrid connection operation_sThe specific number of the active carbon atoms can be adjusted according to actual conditions.

It should be noted that: when the structure of the neural network to be determined is a plurality of neural networks, a plurality of structure controllers are required to respectively represent the weight values of the structure parameters of the respective candidate quantities included in the structure search variables of the plurality of neural networks.

Here, the process of the hybrid convolution operation is still explained in the case where the hybrid convolution operation includes the aforementioned six basic convolution operations.

Fig. 5 shows the procedure of the hybrid convolution operation between the previous network layer of the current intermediate network layer and the current intermediate network layer, i.e. the product of each basic convolution operation and the weight value corresponding to the basic convolution operation is summed. W in the roll controller shown in FIG. 5₀-W₅The weights correspond to standard convolution 3 × 3, standard convolution5 × 5, separation convolution 3 × 3, separation convolution5 × 5, void convolution 3 × 3, and void convolution5 × 5, respectively. In practical application, after the structure search is completed, W₀-W₅The larger the value in (1), the more important the representation of the corresponding basic convolution operation is, and the most important basic convolution operation is the optimal basic convolution operation between the previous network layer of the current intermediate network layer and the current intermediate network layer. It should be noted how to determine W specifically₀-W₅Reference is made to the description of step 203 hereinafter.

Based on this, in some embodiments, the obtaining, by the structure controller and the model parameter controller, an output of the hybrid convolution operation of the current intermediate network layer includes:

In practical applications, the hybrid convolution operation can be represented by the following formula (1):

wherein MConvBlock represents a hybrid convolution operation; n is_pRepresenting the number of kinds of all basic convolution operations contained in the hybrid convolution operation;

representing the weight value corresponding to the ith basic convolution operation; p is a radical of_i(x) Representing the ith basic convolution operation.

Here, p_i(x) And w_modelThe relationship therebetween can be represented by the following formula (2):

wherein p is_i(x) Represents the ith basic convolution operation;

representing the weight value corresponding to the convolution kernel of the ith basic convolution operation;

deviation of convolution kernel representing ith basic convolution operation, and

and correspondingly, the corresponding relation can be obtained by inquiring the set corresponding relation.

It should be noted that x is a variation, and when MConvBlock is a hybrid convolution operation of the current intermediate network layer, i.e. the l-th layer, x is an output of the l-1-th layer intermediate network layer.

Substituting equation (2) for equation (1) may arrive at the output of the hybrid convolution operation of the current intermediate network layer.

In step b1, the process for the hybrid join operation is still illustrated as the hybrid join operation includes the two basic convolution operations described above.

Fig. 6 shows the procedure of the hybrid connection operation between a certain network layer of all previous network layers of the current intermediate network layer and the current intermediate network layer, i.e. the sum of the products of each basic connection operation and the weight value corresponding to the basic connection operation. W in connection controller shown in FIG. 6₆-W₇Respectively corresponding to the connected and disconnected weight values. When in actual application, after completing the search, W₆-W₇The larger the value in (1), the more important the representation of the corresponding basic connection operation, and the most important basic connection operation is the optimal basic connection operation between a certain network layer in each network layer before the current intermediate network layer and the current intermediate network layer. It should be noted how to determine W specifically₆-W₇Reference is made to the description of step 203 below.

Based on this, in some embodiments, said deriving, with said fabric controller, an output of a hybrid connectivity operation of said current intermediate network layer comprises:

In practical applications, the hybrid convolution operation can be represented by the following equation (3):

wherein the content of the first and second substances,mconblock stands for hybrid connection operation; n is_sIndicates the number of kinds of all basic connection operations included in the hybrid connection operation;

representing a weight value corresponding to the ith basic connection operation; s_i(x) Indicating the ith basic join operation.

It should be noted that x is a variation, and when mconblock is a hybrid connection of the current intermediate network layer, i.e., the l-th layer, x is an output of the l-1-th intermediate network layer. In addition, the basic join operation here has no model parameters, and when the basic join operation is join, s_i(x) An input for the intermediate network layer of the connection; when the basic connection operation is not connected, s_i(x) Without the input of the intermediate network layer brought about by the connection operation.

In step c1, as can be seen from fig. 4: the output of the L-th intermediate network layer is specifically associated with the hybrid convolution operation of the L-1-th network layer, for example, with the hybrid connection of the 0-th (input network layer), 1-th, 2-th, …, and L-1-th intermediate network layers, so that in practical applications, the output of the current intermediate network layer, i.e., the L-th intermediate network layer, can be expressed by the following formula (4):

wherein o is_lIs the output of the l-th intermediate network layer; MConvBlock_l(o_l-1) Representing a hybrid convolution operation of the l-th intermediate network layer; MConnBlock_li(o_i) Indicating the ith (i ═ 1, 2, …, l-1) hybrid connection operation included in the l-th intermediate network layer.

Here, the current output of the intermediate network layer can be obtained by substituting the expressions (1), (2), and (3) into the expression (4).

In addition, in fig. 4, it should be noted that, for each of the intermediate network layers of the neural network to be determined, a hybrid convolution operation and a plurality of hybrid connection operations included in the corresponding network layer need to be fused, and a fused output may be subjected to batch normalization (BatchNorm). Here, the batch normalization performs averaging and variance normalization on the output of a certain intermediate network layer in the middle of the neural network, so as to solve the problem that the data distribution of the intermediate network layer changes in the training process.

In step d1, after the output of the L-th intermediate network layer in the intermediate network layers is obtained, since the output of the L-th intermediate network layer is related to the output of each layer network layer before the L-th intermediate network layer, the output of the last L-th intermediate network layer, the output of the L-th intermediate network layer, and the output of the neural network to be determined are obtained through the output network layer by sequentially recursive calculation starting from the 1-th intermediate network layer.

In step 202, the data input into the neural network to be determined for structure search is referred to as an input data set. The input data set is divided into a training set and a validation set. The training set is mainly used for optimizing model parameters in the subsequent structure search; the verification set is mainly used for optimizing the structural parameters in the subsequent structural search. Here, the model parameter refers to a parameter of a convolution kernel in a neural network; the structural parameters refer to parameters of convolution operation and connection operation in the neural network.

Here, the first loss function is a loss function corresponding to a training set of the neural network to be determined; and the second loss function is a loss function corresponding to the verification set of the neural network to be determined.

In practical application, inputting a training set in an input data set into a neural network to be determined, obtaining the output of the training set in the neural network to be determined, namely a predicted value, by using the step 201, and obtaining a loss function corresponding to the training set by using the predicted value and the true value of the training set; inputting the verification set in the input data set into the neural network to be determined, the output of the verification set in the neural network to be determined, that is, the predicted value, can be obtained by using step 201, and the loss function corresponding to the verification set can be obtained by using the predicted value and the true value of the verification set.

In practical applications, there are various ways to determine the loss function according to the predicted value and the true value, and one of the ways is described below as an example.

For example, the loss function can be determined using equation (5).

Wherein, loss represents the loss function of the neural network to be determined in the embodiment of the application; o_predRepresenting a predicted value; o_trueRepresenting the true value.

In step 203, the model parameters refer to the weight values in the model parameter controller; the configuration parameters refer to weight values in the configuration controller.

In practical application, the model parameters and the structural parameters can be updated and optimized by adopting a gradient descent algorithm.

Specifically, when the model parameters are optimized by using the loss function of the training set, the model parameters can be optimized by using equation (6);

wherein, w_modelThe current weight value, i.e. the current model parameter, w ', for the convolution kernel of the base convolution operation'_modelUpdated weight values, i.e. updated model parameters, loss, corresponding to the convolution kernels of the basic convolution operation_trainFor the loss function of the training set, α is the model optimizer_modelThe learning rate of (a) is determined,

is the gradient of the loss function of the training set.

Specifically, when the structural parameter is optimized by using the loss function of the training set, the structural parameter may be optimized by using equation (7).

Wherein, w_arcThe current weight values, i.e. the current structural parameters, w ', corresponding to the basic convolution operation and the basic concatenation operation'_arcUpdated weight values, i.e. updated structure parameters, loss, corresponding to the basic convolution and basic concatenation operations_validTo validate the loss function of the set, β is the structure optimizer_arcThe learning rate of (a) is determined,

is the gradient of the loss function of the training set.

Here, the idea of the gradient descent method is to solve the minimum value in the direction of the gradient descent. In practical application, an initial w may be randomly determined_modelAnd w_arcThen, the gradient descending direction is determined according to the partial derivative of the loss function, and a smaller value than the last time is solved along the gradient descending direction by preset step lengths, namely alpha in formula (6) and beta in formula (7), until a minimum value is obtained. Here, the acquisition modes of α and β may be from small to large, and an optimal solution is selected by testing separately.

It should be noted that, because the model parameter and the structural parameter in the output of the neural network to be determined are multiplied, in order to optimize the result, for each iteration, the structural parameter may be fixed, the model parameter may be optimized, the model parameter may be fixed, the structural parameter may be optimized, and the iteration may be performed until the optimal solution is optimized.

In step 204, in some embodiments, the determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters includes:

In practical application, for

equations

8 and 9 in step 203, iterative optimization is performed for multiple times until the loss of the training set of the input data set and the loss of the validation set both converge, e.g., the loss function takes a value of 0. At this time, after the search is finished, the optimum convolution operation, which is the basic convolution operation with the largest weight value, the optimum convolution kernel, which is the convolution kernel of the basic convolution operation with the largest weight value, and the optimum connection operation, which is the basic connection operation with the largest weight value, are selected, and the searched network structure, that is, the required neural network structure, is determined according to the optimum convolution operation, the optimum convolution kernel of the convolution operation, and the optimum connection operation.

The structure searching method of the neural network provided by the embodiment of the invention is based on a preset network architecture, and links each intermediate network layer in at least one intermediate network layer by using a structure searching variable and a structure searching complete set to obtain the output of the neural network to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer; determining a first loss function based on the output of the neural network to be determined and a training set; determining a second loss function based on the output of the neural network to be determined and a validation set; updating model parameters of the neural network to be determined based on the first loss function; updating structural parameters of the neural network to be determined based on the second loss function; and determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters. In the embodiment of the invention, the structure search complete set is linked to each intermediate network layer of the neural network to be determined so as to obtain the output which can embody all the selectable network structure schemes of the neural network to be determined, then the output and the training set are utilized to update the model parameters of the neural network to be determined, the output and the verification set are utilized to update the model parameter structure parameters of the neural network to be determined until the optimal model parameters and structure parameters are found, and finally the optimal model parameters and structure parameters are utilized to obtain the structure of the neural network to be determined. Therefore, the automatic searching of the neural network structure can be realized, the time is saved, and the efficiency is improved.

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

The architecture of the traffic flow prediction model ST-NASNet proposed in the embodiment of the present application is shown in fig. 7. The traffic flow prediction model proposed in the embodiments of the present application is different from the traffic flow prediction model in the related art in that: the traffic flow prediction model provided in the embodiment of the application adopts a neural Network aiming at proximity features, namely a Network structure Search-based proximity neural Network (NAS-C-Net), a neural Network aiming at periodic features, namely a Network structure Search-based periodic neural Network (NAS-P-Net), and a neural Network aiming at Trend features, namely a Network structure Search-based Trend neural Network (NAS-T-Net) to respectively replace three residual error networks in the traffic flow prediction model in the related technology for extracting the proximity features, the Trend features and the periodic features in the input data set. The NAS-C-Net, the NAS-P-Net and the NAS-T-Net are neural networks which are actively learned by utilizing a network structure searching method respectively, and the network structures of the NAS-C-Net, the NAS-P-Net and the NAS-T-Net have diversity. Here, the structure search method may also be referred to as an automatic machine learning method.

It should be noted that: the nature of the prediction model and the residual error network mentioned in the embodiments of the present application are both neural networks.

The following will describe in detail how to use the structure search method to actively learn three neural networks, namely NAS-C-Net, NAS-P-Net and NAS-T-Net, which are the neural networks to be determined in the traffic flow prediction model proposed in the application embodiment of the present invention.

Fig. 8 is a schematic view illustrating an implementation of a structure search method for a traffic flow prediction model according to an embodiment of the present invention, as shown in fig. 8, the method includes the following steps:

step 801: acquiring an input data set, and dividing the input data set into a training set and a verification set;

the input data set is used as an input to the traffic flow prediction model. In practical application, the map of the preset area is divided into grids according to the preset length, and traffic flow conditions on each grid of the preset area are obtained according to the time sequence, so that a plurality of traffic flow class diagram network matrixes of the preset area, which are sorted according to the time stamps, are obtained, and the traffic flow class diagram network matrixes of the preset area, which are sorted according to the time stamps, are used as an input data set. Here, the preset region may be a city, the preset length may be 1Km, and the traffic flow may be a pedestrian flow, a bicycle flow, a taxi flow, a bus flow, or the like.

In practice, the input data set is obtained by receiving the input data set, for example, by receiving the input data set input by the relevant person through the input interface. Here, the input interface may be a keyboard, a mouse, or the like.

The training set is mainly used for optimizing model parameters in the subsequent structure search; the verification set is mainly used for optimizing the structural parameters in the subsequent structural search. Here, the model parameter refers to a parameter of a convolution kernel in a neural network; the structural parameters refer to parameters of convolution operation and connection operation in the neural network.

In practical application, the training set and the verification set can be divided according to the time stamp of each data in the input data set.

For example, when traffic flow conditions of 2 to 9 months in 2019 of beijing city are obtained, 2 to 8 months in 2019 are used as input data sets, and traffic flow of 9 months in 2019 needs to be predicted, traffic flow conditions of 2, 3, 4 and 5 months in 2019 can be used as training sets in the input data sets, traffic flow conditions of 6 and 7 months in 2019 can be used as verification sets in the input data sets, traffic flow conditions of 8 months in 2019 can be used as test sets in the input data sets, and traffic flow conditions of 9 months in 2019 can be used as true values of traffic flow.

When extracting the characteristics of the proximity, periodicity and trend of the input data set, the three neural networks, namely NAS-C-Net, NAS-P-Net and NAS-T-Net, receive different input data according to the time stamp of the data in the input data set. The input data received by the NAS-C-Net is the data with the time stamp of the data in the input data set close to the predicted time; the input data received by the NAS-P-Net is data with far time stamps and prediction time of the data in the input data set; the input data received by the NAS-T-Net is data in the input data set, wherein the time stamp of the data is far away from the prediction time.

Here, close, far, and far are relative concepts before and after time. For example, if it is now desired to predict traffic flow conditions at 8 o ' clock a day, then approaching may be understood as traffic flow conditions at 7 o ' clock to 8 o ' clock on the day; far can be understood as the traffic flow situation of the day before the certain day; far from this may be understood as the traffic flow situation of the week before the certain day. And when the input data set is already data of a month before the certain day, the proximity may be data of the input data set which is relatively closest to the certain day; more distant may be data in the input data set that is relatively closer to the certain day; a long distance may be data in the input data set that is relatively far from the certain day.

Step 802: receiving a preset network architecture, a structure search variable and a structure search complete set of the structure search variable of a neural network to be determined in a traffic flow prediction model; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer;

here, the neural networks to be determined are three neural networks, namely NAS-C-Net, NAS-P-Net and NAS-T-Net.

Here, the preset network architectures of the three neural networks to be determined, namely NAS-C-Net, NAS-P-Net and NAS-T-Net, include: an input network layer, an intermediate network layer and an output network layer; wherein the intermediate network layers comprise at least one intermediate network layer; and the current intermediate network layer is linked with the previous network layer of the current intermediate network layer through mix convolution operation (mix convolution operation); the current intermediate network layer is linked with each layer of network layer before the current intermediate network layer through mix skip connection operation. The schematic diagrams of the network architectures of the three neural networks, namely the NAS-C-Net, the NAS-P-Net and the NAS-T-Net, can refer to the network architecture shown in FIG. 3. However, it should be noted that: the number of network layers, i.e. four layers, in fig. 3 is only used for illustration, but the number of network layers is not used to limit the number of network layers in the network architecture of the present invention.

Here, the structure search variable refers to a network structure parameter that the neural network to be determined needs to be determined by using a structure search method. Determining two structure search variables, namely a first structure search variable and a second structure search variable, aiming at each intermediate network layer; wherein the first structure search variable is specifically: the current intermediate network layer in the plurality of intermediate network layers is linked with the previous network layer of the current intermediate network layer through what kind of basic convolution operation; the second structure search variable is specifically: the method specifically comprises how to perform connection operation on the current intermediate network layer and each network layer before the current intermediate network layer.

For example, when a map of a certain city is divided into grids according to 1Km as a unit, a class diagram network matrix of the city is obtained according to the representation of the grid division of the city according to the traffic flow conditions of different time axes, and the class diagram network matrix is used as an input data set of the embodiment of the application. In practice, when the traffic flow to be predicted is specifically bicycle flow, then the dimensions 3 x 3 and 3 x 3 adjacent 5 x 5 embodying a convolution operation of 3 km may be selected taking into account that the predominant travel journey of the bicycle is around 3 km. In some embodiments, the hybrid convolution operation includes six basic convolution operations: standard convolution 3 × 3(standard convolution 3 × 3), standard convolution5 × 5(standard convolution5 × 5), discrete convolution 3 × 3 (discrete convolution 3 × 3), discrete convolution5 × 5 (discrete convolution5 × 5), hole convolution 3 × 3 (dimension convolution 3 × 3), and hole convolution5 × 5 (dimension convolution5 × 5). Here, the six basic convolution operations constitute a structural search corpus of first structural search variables. As can be seen from the above description, the size of the convolution operation can be chosen according to the type of vehicle.

In practical application, the mode of receiving the preset network architecture, the structure search variable and the structure search complete set of the structure search variable of the neural network to be determined is to receive the preset network architecture, the structure search variable and the structure search complete set of the structure search variable.

Step 803: based on the preset network architecture, linking each intermediate network layer in the at least one intermediate network layer by using the structure search variable and the structure search complete set of the structure search variable to obtain the output of the neural network to be determined;

the outputs of the three neural networks to be determined, namely NAS-C-Net, NAS-P-Net and NAS-T-Net, are weighted summation representations of all possible link modes of each intermediate network layer in the neural network to be determined.

Here, the determination process of the output of the neural network to be determined may refer to the description of the foregoing step 201. The three neural networks, NAS-C-Net, NAS-P-Net and NAS-T-Net, can all obtain the corresponding outputs of each neural network by the method of step 201.

It should be noted that: here, three different structural controllers need to be employed in the output of the neural network to be determined

To respectively represent the weight values of the structure parameters of each candidate quantity included in the structure search variables of three neural networks of NAS-C-Net, NAS-P-Net and NAS-T-Net.

It should be noted that there is no obvious sequence between step 801 and

steps

802 and 803, and step 801 only needs to be completed before step 804 starts.

Step 804: obtaining a loss function corresponding to a training set of a traffic flow prediction model based on the output of the neural network to be determined and the training set; obtaining a loss function corresponding to a verification set of a traffic flow prediction model based on the output of the neural network to be determined and the verification set;

in practical applications, the specific implementation of step 804 includes:

step a 2: fusing the NAS-C-Net output, the NAS-P-Net output, the NAS-T-Net output and the output of an additional influence network contained in the traffic flow prediction model, and obtaining a prediction value of the traffic flow prediction model through a connection operation tanh;

specifically, the predicted value of the traffic flow prediction model in the embodiment of the present application can be obtained by using equation (8).

o_pred＝tanh(o_C+o_P+o_T+o_E) (8)

Wherein o is_predRepresenting the predicted value of the traffic flow prediction model in the embodiment of the application; o_CRepresenting the NAS-C-Net output; o_PRepresenting the NAS-P-Net output; o_TRepresenting the NAS-T-Net output; o_EIndicating additional influence on the networkAnd (6) outputting.

Note that o is_ECan be obtained by the existing method, because of o_EAre not the core focus of the present invention and are not described herein.

Step b 2: and obtaining a loss function by utilizing the predicted value of the traffic flow prediction model and the actual value of the traffic flow in the embodiment of the application.

In practical application, the training set in the input data set is input into the traffic flow prediction model in the embodiment of the present application, the outputs of the training sets in three neural networks, namely NAS-C-Net, NAS-P-Net and NAS-T-Net, can be obtained in step 803, the predicted value of the traffic flow prediction model in the embodiment of the present application can be obtained by using formula 8 in step a2, and the loss function corresponding to the training set of the traffic flow prediction model in the embodiment of the present application can be obtained by using the predicted value and the true value. Inputting the verification set in the input data set into the traffic flow prediction model in the embodiment of the present application, the output of the verification set in three neural networks, namely NAS-C-Net, NAS-P-Net and NAS-T-Net, can be obtained in step 803, the predicted value of the traffic flow prediction model in the embodiment of the present application can be obtained by using formula 8 in step a2, and the loss function corresponding to the test set of the traffic flow prediction model in the embodiment of the present application can be obtained by using the predicted value and the true value.

Here, a specific method for obtaining the loss function by using the predicted value of the traffic flow prediction model and the actual value of the traffic flow in the embodiment of the present application may refer to step 202.

It should be noted that when the corresponding loss function is determined by using equation 5 in step 202, loss represents the loss function of the traffic flow prediction model in the embodiment of the present application; o_predRepresenting the predicted value of the traffic flow prediction model in the embodiment of the application; o_trueWhich represents the actual value of traffic flow in the embodiment of the present application.

In practical application, the data representing the real value of the traffic flow may be acquired simultaneously with the input data set, and the acquisition mode of the data representing the real value of the traffic flow may be the same as the acquisition mode of the input data set.

Step 805: updating the model parameters of the neural network to be determined based on the loss function corresponding to the training set of the traffic flow prediction model; updating the structural parameters of the neural network to be determined based on the loss function corresponding to the verification set of the traffic flow prediction model;

here, the model parameter refers to a weight value in the model parameter controller, and the specific method for updating the model parameters of the three neural networks to be determined, i.e., NAS-C-Net, NAS-P-Net, and NAS-T-Net, according to the loss function corresponding to the training set of the traffic flow prediction model may refer to formula (6) in step 203.

Here, the structural parameters refer to weight values in the structural controller, and the specific method for updating the structural parameters of the three neural networks to be determined, i.e., NAS-C-Net, NAS-P-Net, and NAS-T-Net, based on the loss functions corresponding to the verification set of the traffic flow prediction model may refer to (7) in step 203.

It should be noted that the structural parameters of the three neural networks, NAS-C-Net, NAS-P-Net and NAS-T-Net, need to be updated correspondingly.

Step 806: and determining the structure of the neural network to be determined in the traffic flow prediction model by using the updated model parameters and the updated structure parameters.

In practical application, iterative optimization is carried out for multiple times until the loss function corresponding to the training set of the traffic flow prediction model and the loss function corresponding to the verification set of the traffic flow prediction model converge, if the value of the loss function is 0. At this time, after the search is finished, the optimum convolution operation, which is the basic convolution operation with the largest weight value, the optimum convolution kernel, which is the convolution kernel of the basic convolution operation with the largest weight value, and the optimum connection operation, which is the basic connection operation with the largest weight value, are selected, and the searched network structure, that is, the required neural network structure, is determined according to the optimum convolution operation, the optimum convolution kernel of the convolution operation, and the optimum connection operation.

In some embodiments, the network structures NAS-C-Net, NAS-P-Net, and NAS-T-Net learned by the three characteristics of proximity, periodicity, and trending are shown in FIG. 9a, FIG. 9b, and FIG. 9C, respectively, for the New York bicycle traffic data set.

It will be understood by those skilled in the art that the input data representing the proximity is the data with the closest time to the predicted time in the input data set, and at this time, the predicted traffic flow has a high correlation with all traffic flows in the input data, i.e. the low-level features in NAS-C-Net (in the neural network, the deeper the number of layers of convolution passed, the higher the obtained features) greatly affect the final traffic flow prediction, so the input network layer is directly connected to the output network layer in fig. 9 a. The periodic input data is data whose input data set time is far from the predicted time, and at this time, the predicted traffic flow has a high correlation with the traffic flow at the same time in the input data, that is, the characteristics at a certain time in the NAS-P-Net significantly affect the final traffic flow prediction, so the output network layer (the third intermediate network layer) of a certain intermediate network layer is connected to the output network layer in fig. 9 b. The trending input data is the data with the farthest time and the farthest predicted time in the input data set, and at this time, the predicted traffic flow has a certain correlation with the traffic flow of each hierarchy in the traffic flow in the input data, that is, the influence of each hierarchy in the NAS-T-Net has no bias, and has a certain correlation with the final traffic flow prediction, so in fig. 9c, the input network layer is connected to a plurality of intermediate network layers.

As can be seen from fig. 9a, 9b and 9c, the residual structure is not necessarily the optimal network structure for one input data set, and the optimal network structures determined based on the three characteristics of proximity, periodicity and tendency are not the same.

In practical application, for different input data sets, the structure search method of the neural network can be utilized to perform the structure search respectively, so as to obtain different optimal network structures suitable for the respective input data sets.

For example, also for traffic flow prediction tasks, there are now two different input data sets, one of which is a data set for taxi traffic in beijing; another input data set is a data set of the flow of people in the dormitory. The two different data sets are respectively subjected to structure search by adopting the method for searching the structure of the neural network, so that a network structure for predicting taxi traffic of Beijing city as shown in fig. 10a and a network structure for predicting passenger traffic of dormitory city as shown in fig. 10b are obtained.

As can be seen in fig. 10a and 10 b: for the Beijing city renting flow prediction task with the city in the core geographic position and the larger moving range distance, the characteristics of the global correlation represented by the last middle network layer in the network structure obtained by searching are more important, and the convolution of the operation with the large scale of 5 multiplied by 5 is more important; for a relatively small city such as a host and a pedestrian flow prediction task with a relatively small moving range, characteristics of a lower intermediate network layer and convolution of small-scale operation 3 x 3 in a finally searched network structure have a better effect. That is, for different input data sets, the structure search can be performed by using the above neural network structure search method, so as to obtain different optimal network structures suitable for the respective input data sets.

In order to implement the method of the embodiment of the present invention, an embodiment of the present invention further provides a structure search apparatus for a neural network, which is disposed on an electronic device. Fig. 11 is a diagram illustrating a structure of a structure search apparatus of a neural network according to an embodiment of the present invention, and as shown in fig. 11, the structure search apparatus 1100 includes:

a first determining unit 1101, configured to link each intermediate network layer in the at least one intermediate network layer by using a structure search variable and a structure search corpus based on a preset network architecture, so as to obtain an output of a neural network to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer;

a second determining unit 1102, configured to determine a first loss function based on the output of the neural network to be determined and a training set; determining a second loss function based on the output of the neural network to be determined and a validation set;

an updating unit 1103, configured to update the model parameters of the neural network to be determined based on the first loss function; updating structural parameters of the neural network to be determined based on the second loss function;

a third determining unit 1104, configured to determine a structure of the neural network to be determined by using the updated model parameters and the updated structure parameters.

In some embodiments, the first determining unit 1101 includes:

the first determining module is used for obtaining the output of the mixed convolution operation of the current intermediate network layer by utilizing the structure controller and the model parameter controller; the hybrid convolution operation includes a plurality of basic convolution operations; the structure controller searches the weight value of each candidate quantity included in the variable for the structure; the model parameter controller is a weighted value corresponding to a convolution kernel of each basic convolution operation;

a second determining module, configured to obtain, by using the fabric controller, an output of the hybrid connection operation of the current intermediate network layer;

a third determining module, configured to obtain an output of the current intermediate network layer by using an output of the hybrid convolution operation and an output of the hybrid join operation;

and the fourth determining module is used for obtaining the output of the neural network to be determined by utilizing the output of the current intermediate network layer.

In some embodiments, the first determining module is specifically configured to:

summing products of each basic convolution operation and the weighted value corresponding to each basic convolution operation to obtain the output of the mixed convolution operation of the current intermediate network layer; the structure controller comprises a weight value corresponding to each basic convolution operation; the basic convolution operation is derived using the model controller and the output of an intermediate network layer that is one layer before the current intermediate network layer.

In some embodiments, the second determining module is specifically configured to:

In some embodiments, the updating unit 1103 includes:

a first updating module, configured to update the model parameters of the neural network to be determined by using a gradient descent method based on the first loss function;

and the second updating module is used for updating the structural parameters of the neural network to be determined by using a gradient descent method based on the second loss function.

In some embodiments, the third determining unit 1104 includes:

an update judgment module, configured to update the model parameter and the structural parameter in a loop until both the first loss function and the second loss function converge;

and the fifth determining module is used for determining the structure of the neural network to be determined based on the model parameters and the structure parameters when the first loss function and the second loss function are converged.

In practical applications, the first determining unit 1101, the first determining module, the second determining module, the third determining module, the fourth determining module, the second determining unit 1102, the updating unit 1103, the first updating module, the second updating module, the third determining unit 1104, the update judging module, and the fifth determining module may be implemented by a processor in a structure searching apparatus of a neural network.

It should be noted that: in the above embodiment, when training the image processing model, the structure search apparatus of the neural network is exemplified by only dividing the program modules, and in practical applications, the above processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the above-described processing. In addition, the structure search device of the neural network provided by the above embodiment and the structure search method embodiment of the neural network belong to the same concept, and the specific implementation process thereof is described in detail in the method embodiment and is not described herein again.

Fig. 12 is a schematic diagram illustrating an implementation of a training method for a neural network according to an embodiment of the present invention, as shown in fig. 12, the method includes the following steps:

step 1201: acquiring an input data set;

step 1202: inputting the input data set into a neural network to be trained to obtain a predicted value of the neural network to be trained;

step 1203: determining a loss function of the neural network to be trained based on the predicted value;

step 1204: updating model parameters of the neural network to be trained based on the loss function;

In step 1201, the input data set is used as input to the neural network to be trained.

In practical applications, for example, when the neural network to be trained is a traffic flow prediction model. The method comprises the steps of dividing a map of a preset area (such as a certain city) into grids according to a preset length (such as 1Km as a unit), and acquiring traffic flow (such as pedestrian flow, bicycle flow, taxi flow and the like) on each grid of the preset area according to a time sequence, so as to obtain a class diagram network matrix of a plurality of traffic flows of the preset area which are sorted according to timestamps, and using the class diagram network matrix of the plurality of traffic flows which are sorted according to the timestamps as an input data set.

In practice, the input data set is obtained by receiving the input data set, for example, receiving data information including the input data set, which is input by a relevant person through an input interface (e.g., a keyboard, a mouse, etc.).

It should be noted that, in the embodiment of the present application, in the case that the network structure is determined, only the model parameters are optimized, and the optimization of the structure parameters is not involved, so that the input data set does not need to be divided into the training set and the verification set. In practical applications, the divided training set and the verification set in the embodiment of the present invention may be combined as the training set in the embodiment.

In step 1202, the neural network to be trained is obtained by searching based on the neural network structure searching method provided by the embodiment of the present invention.

In step 1203, when the neural network to be trained is a traffic flow prediction model, determining a value of a loss function of the traffic flow prediction model based on the predicted value and a true value of the traffic flow.

In practical application, the data representing the real value of the traffic flow may be acquired simultaneously with the input data set, and the acquisition mode of the data representing the real value of the traffic flow may be the same as the acquisition mode of the input data set. The calculation process of the loss function can refer to the description of step 203.

In step 1204, when actually applied, the updating step is iteratively optimized until the loss converges.

In this embodiment, a network structure obtained by searching the neural network structure search method provided in the embodiment of the present invention is to be trained.

In order to implement the method of the embodiment of the present invention, an embodiment of the present invention further provides a training apparatus for a neural network, which is disposed on an electronic device. Fig. 13 is a diagram illustrating a structure of a training apparatus for a neural network according to an embodiment of the present invention, and as shown in fig. 13, the training apparatus 1300 includes:

an obtaining unit 1301, configured to obtain an input data set;

a predicting unit 1302, configured to input the input data set into a neural network to be trained, so as to obtain a predicted value of the neural network to be trained;

a determining unit 1303, configured to determine a loss function of the neural network to be trained based on the predicted value;

an updating unit 1304, configured to update a model parameter of the neural network to be trained based on the loss function;

In actual applications, the obtaining unit 1301, the predicting unit 1302, the determining unit 1303, and the updating unit 1304 may be implemented by a processor in a structure searching apparatus of a neural network.

It should be noted that: in the training apparatus for a neural network according to the above embodiment, when the neural network is trained, only the division of the program modules is taken as an example, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the training apparatus of the neural network provided in the above embodiments and the training method of the neural network belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Based on the hardware implementation of the program module, and in order to implement the method according to the embodiment of the present invention, an embodiment of the present invention further provides an electronic device 1400, where the electronic device 1400 includes:

a memory 1401 for storing executable instructions;

the processor 1402 is configured to implement the structure search method of the neural network provided in the embodiment of the present invention or implement the training method of the neural network provided in the embodiment of the present invention when executing the executable instructions stored in the memory.

In practice, as shown in fig. 14, the various components of the electronic device 1400 are coupled together by a bus system 1403. It is understood that bus system 1403 is used to enable connection communication between these components. The bus system 1403 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled in fig. 14 as bus system 1403.

In some embodiments, the storage medium may be a Memory such as a magnetic Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In order to better illustrate that the scheme of the embodiment of the invention can ensure that the performance of the network structure approaches or exceeds the design level of an expert on the premise of automatically completing the network structure search, a corresponding comparison experiment is carried out.

Experiment 1: performance comparison experiment

The model ST-NASN based on network structure search of the invention, the network structure ST-ResNet designed by experts, and the optimal network structure search model Datts and ENAS in the current image and Natural Language Processing (NLP) field are subjected to performance comparison experiments, and the comparison results are shown in Table 1:

TABLE 1

As can be seen from table 1: ST-NASNET is superior to the expert designed network structure ST-ResNet, and the effect is close to the optimal Darts result. Furthermore, ENAS is particularly unstable in performance.

Experiment 2: time consuming comparative experiment

The model ST-NASNET based on network structure search of the invention is compared with the data and ENAS in a time-consuming experiment, and the comparison result is shown in the table 2:

TABLE 2

As can be seen from table 1: in the case of close storage consumption, the time consumption of ST-NASN is much less than that of Darts and ENAS.

Experiment 3: sensitivity comparison experiment

The sensitivity comparison experiment is carried out on the model ST-NASNT based on the network structure search and the network structure ST-ResNet designed by experts, fig. 15a to 15i show the comparison result of the root mean square error for different convolution filters (filters), the comparison result of the average absolute error for different convolution filters, the comparison result of the average absolute percentage error for different convolution filters, the comparison result of the root mean square error for different learning rates (lr), the comparison result of the average absolute error for different learning rates, the comparison result of the average absolute percentage error for different learning rates, the comparison result of the root mean square error for different test set sizes, the comparison result of the average absolute error for different test set sizes, and the comparison result of the average absolute percentage error for different test set sizes.

As can be seen from fig. 15a to 15 i: the model ST-NasNet based on network structure search can achieve better results than expert design for different convolution filters, different learning rates and different test set sizes.

It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

In addition, the technical solutions described in the embodiments of the present invention may be arbitrarily combined without conflict.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. A method of structure search of a neural network, the method comprising:

acquiring an input data set, and dividing the input data set into a training set and a verification set; the input data set comprises a class diagram network matrix of a plurality of traffic flows of a preset area, wherein the class diagram network matrix is ordered according to time stamps; the training set and the validation set are partitioned according to a timestamp of each data in the input data set;

receiving a preset network architecture, a structure search variable and a structure search complete set of the structure search variable of a neural network to be determined in a traffic flow prediction model; wherein the traffic flow prediction model includes at least a first neural network for proximity features, a second neural network for periodic features, and a third neural network for trending features; the first neural network, the second neural network and the third neural network are all neural networks to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer; the current intermediate network layer is linked with the previous network layer of the current intermediate network layer through mixed convolution operation; the current intermediate network layer is linked with each layer of network layer before the current intermediate network layer through hybrid connection operation;

based on the preset network architecture, linking each intermediate network layer in at least one intermediate network layer by using the structure search variable and the structure search complete set of the structure search variable to obtain the output of the neural network to be determined;

determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters;

wherein; searching the determined first neural network, the second neural network and the third neural network in sequence to form the traffic flow prediction model; the traffic flow prediction model can predict the traffic flow of the preset area.

2. The method according to claim 1, wherein the obtaining the output of the neural network to be determined by linking each of the at least one intermediate network layer with a structure search variable and a structure search corpus based on a preset network architecture comprises:

3. The method of claim 2, wherein obtaining the output of the hybrid convolution operation for the current intermediate network layer using the fabric controller and the model parameter controller comprises:

4. The method of claim 2, wherein said deriving, with the fabric controller, an output of the hybrid connectivity operation for the current intermediate network layer comprises:

5. The method of claim 1,

the updating of the model parameters of the neural network to be determined based on the first loss function includes:

6. The method of claim 1, wherein determining the structure of the neural network to be determined using the updated model parameters and the updated structure parameters comprises:

7. An apparatus for searching a structure of a neural network, the apparatus comprising:

a first acquisition unit for acquiring an input data set and dividing the input data set into a training set and a verification set; the input data set comprises a class diagram network matrix of a plurality of traffic flows of a preset area, wherein the class diagram network matrix is ordered according to time stamps; the training set and the validation set are partitioned according to a timestamp of each data in the input data set;

the receiving unit is used for receiving a preset network architecture of a neural network to be determined in the traffic flow prediction model, a structure search variable and a structure search complete set of the structure search variable; wherein the traffic flow prediction model includes at least a first neural network for proximity features, a second neural network for periodic features, and a third neural network for trending features; the first neural network, the second neural network and the third neural network are all neural networks to be determined; the preset network architecture comprises an input network layer, an output network layer and at least one intermediate network layer; the current intermediate network layer is linked with the previous network layer of the current intermediate network layer through mixed convolution operation; the current intermediate network layer is linked with each layer of network layer before the current intermediate network layer through hybrid connection operation;

the first determining unit is used for linking each intermediate network layer in at least one intermediate network layer by using the structure search variable and the structure search complete set of the structure search variable based on the preset network architecture to obtain the output of the neural network to be determined;

the third determining unit is used for determining the structure of the neural network to be determined by using the updated model parameters and the updated structure parameters; the method comprises the following steps that a first neural network, a second neural network and a third neural network which are determined by searching in sequence are used for forming a traffic flow prediction model; the traffic flow prediction model can predict the traffic flow of the preset area.

8. A training method of a traffic flow prediction model is characterized by comprising the following steps:

acquiring an input data set; the input data set comprises a class diagram network matrix of a plurality of traffic flows of a preset area, wherein the class diagram network matrix is ordered according to time stamps;

inputting the input data set into a traffic flow prediction model to obtain a prediction value of the traffic flow prediction model;

determining a loss function of the traffic flow prediction model based on the predicted value;

updating model parameters of the traffic flow prediction model based on the loss function;

wherein the traffic flow prediction model is formed based on a neural network searched by the method of any one of claims 1 to 6; the traffic flow prediction model can predict the traffic flow of the preset area.

9. An apparatus for training a traffic flow prediction model, the apparatus comprising:

an acquisition unit for acquiring an input data set; the input data set comprises a class diagram network matrix of a plurality of traffic flows of a preset area, wherein the class diagram network matrix is ordered according to time stamps;

the prediction unit is used for inputting the input data set into a traffic flow prediction model to obtain a prediction value of the traffic flow prediction model;

a determination unit configured to determine a loss function of the traffic flow prediction model based on the prediction value;

an updating unit configured to update a model parameter of the traffic flow prediction model based on the loss function;

10. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the neural network structure search method of any one of claims 1 to 6 or the traffic flow prediction model training method of claim 8 when executing the executable instructions stored in the memory.

11. A storage medium storing executable instructions which, when executed by at least one processor, implement the method of structure search of a neural network according to any one of claims 1 to 6, or implement the method of training a traffic prediction model according to claim 8.