CN111582452B

CN111582452B - Method and device for generating neural network model

Info

Publication number: CN111582452B
Application number: CN202010387565.5A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2023-10-27
Anticipated expiration: 2040-05-09
Also published as: CN111582452A

Abstract

The application relates to the field of artificial intelligence and discloses a method and a device for generating a neural network model. The method comprises the following steps: constructing a super network based on the structure of the target neural network model, wherein each layer of the super network comprises a candidate structure unit set corresponding to each layer of the target neural network model respectively, and the candidate structure unit set comprises network structure units of corresponding layers in the structure of the target neural network model and at least one candidate structure unit similar to the network structure units of corresponding layers in the structure of the target neural network model; initializing a super network, and training the super network based on sample data of a preset domain and a candidate structure unit set corresponding to each layer of the super network; and synchronizing parameters of the target sub-network corresponding to the target neural network model in the trained super-network to the target neural network model. The method realizes the optimization of the target neural network model.

Description

Method and device for generating neural network model

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to the field of artificial intelligence technology, and more particularly, to a method and apparatus for generating a neural network model.

Background

With the development of artificial intelligence technology and data storage technology, deep neural networks have achieved important achievements in many fields of task. The deep neural network can be applied to execute corresponding deep learning tasks after training, and the training effect depends on a large amount of training data. In some scenarios, training data is difficult to acquire, such as infrared images and depth images are not easy to collect, so that a large-scale sample data set cannot be constructed to realize performance optimization of the corresponding depth neural network.

Current methods involve diverting trained neural network models in domains with a large number of samples to domains with a small number of samples. For example, a face recognition model trained based on color face images is applied to recognition of infrared face images. However, the neural network model trained in the domain with a large number of samples is not ideal in the domain with a small number of samples due to the large difference between the data of the different domains.

Disclosure of Invention

Embodiments of the present disclosure provide methods and apparatus, electronic devices, and computer-readable storage media for generating neural network models.

According to a first aspect, there is provided a method of generating a neural network model, comprising: constructing a super network based on the structure of the target neural network model, wherein each layer of the super network comprises a candidate structure unit set corresponding to each layer of the target neural network model respectively, and the candidate structure unit set comprises network structure units of corresponding layers in the structure of the target neural network model and at least one candidate structure unit similar to the network structure units of corresponding layers in the structure of the target neural network model; initializing a super network, and training the super network based on sample data of a preset domain and a candidate structure unit set corresponding to each layer of the super network; and synchronizing parameters of the target sub-network corresponding to the target neural network model in the trained super-network to the target neural network model.

According to a second aspect, there is provided an apparatus for generating a neural network model, comprising: a construction unit configured to construct a super network based on the structure of the target neural network model, each layer of the super network including a set of candidate structural units corresponding to each layer of the target neural network model, respectively, and the set of candidate structural units including network structural units of a corresponding layer in the structure of the target neural network model and at least one candidate structural unit similar to the network structural units of the corresponding layer in the structure of the target neural network model; the training unit is configured to initialize the super network and train the super network based on sample data of a preset domain and a candidate structure unit set corresponding to each layer of the super network; and the synchronization unit is configured to synchronize parameters of a target sub-network corresponding to the target neural network model in the trained super-network to the target neural network model.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a neural network model provided in the first aspect.

According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of generating a neural network model provided by the first aspect.

According to the technology disclosed by the application, the super-training super-network is constructed based on the structure of the target neural network model, so that the parameter optimization of the target neural network model is realized.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is a flow chart of one embodiment of a method of generating a neural network model of the present disclosure;

FIG. 2 is a flow diagram of one training method of the super network in an embodiment of the present disclosure;

FIG. 3 is a flow diagram of another training method of the super network in an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of one embodiment of an apparatus of the present disclosure for generating a neural network model;

Fig. 5 is a block diagram of an electronic device for implementing a method of generating a neural network model of an embodiment of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The method or apparatus of the present disclosure may be applied to a terminal device or a server, or may be applied to a system architecture including a terminal device, a network, and a server. The medium used by the network to provide a communication link between the terminal device and the server may include various connection types, such as a wired, wireless communication link, or fiber optic cable, among others.

The terminal device may be a user end device on which various client applications may be installed. Such as image processing class applications, search applications, voice service class applications, etc. The terminal device may be hardware or software. When the terminal device is hardware, it may be a variety of electronic devices including, but not limited to, smartphones, tablets, electronic book readers, laptop and desktop computers, and the like. When the terminal device is software, it can be installed in the above-listed electronic device. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.

The server may be a server running various services, such as a server running a service based on object detection and recognition of data of images, video, voice, text, digital signals, etc., text or voice recognition, signal conversion, etc. The server may obtain various media data as training sample data for the deep learning task, such as image data, audio data, text data, and the like. The server can acquire the structural information of the target neural network model of the parameters to be optimized, construct a super network according to the structure of the target neural network model, train the super network, and acquire the parameters of the optimized target neural network model based on the trained super network.

The server can also send the determined data such as the structure, parameters and the like of the target neural network model to the terminal equipment. And the terminal equipment deploys and runs the neural network model locally according to the received data so as to execute the corresponding deep learning task.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., a plurality of software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be noted that, the method for generating the neural network model provided by the embodiment of the present disclosure may be performed by a terminal device or a server, and accordingly, the apparatus for generating the neural network model may be provided in the terminal device or the server.

Referring to fig. 1, a flow 100 of one embodiment of a method of generating a neural network model according to the present disclosure is shown. The method for generating the neural network model comprises the following steps:

and step 101, constructing a super network based on the structure of the target neural network model.

In this embodiment, the structural information of the target neural network model may be acquired first. The target neural network model is a neural network model to be optimized, and may be, for example, a neural network model on-line for performing deep learning tasks such as image processing, text translation, speech recognition, and the like of a preset domain.

Here, the preset domain corresponds to a data type or a data generation manner of the data used for processing by the target neural network model, or corresponds to an application scenario of the target neural network model. For example, the preset field corresponds to a format or collection mode of data. Optionally, the preset domain is a domain in which the amount of the obtainable sample data does not exceed a preset number, and the amount of the obtainable sample data in other domains different from the data type or the data generation manner in the preset domain exceeds the preset number. As an example, the preset field corresponds to an infrared image or a scene of depth image processing, and the number of available images in other fields (e.g., color image fields) is typically large.

The network structural units of each layer of the target neural network model can be determined according to the structural information of the target neural network model. Here, one network structural element corresponds to one operation in the neural network model, such as one convolution operation, pooling operation; a network structure may also correspond to a repeatable module in the neural network model, such as a residual module in Resnet, etc. Different convolution operations may correspond to different sizes and/or parameters, e.g., one convolution layer of the target neural network model contains a 3 x 3 convolution kernel and the other convolution layer contains a 5 x 5 convolution kernel, then the two convolution layers correspond to different convolution operations. The structure and/or parameters of the different repeatable modules are different.

Each layer of the super network comprises a candidate structure unit set corresponding to each layer of the target neural network model, and the candidate structure unit set comprises network structure units of corresponding layers in the structure of the target neural network model and at least one candidate structure unit similar to the network structure units of the corresponding layers in the structure of the target neural network model.

Specifically, for each layer of network structural units in the target neural network model, a candidate structural unit similar in structure to the network structural unit may be constructed, added to the set of candidate structural units of the layer in the super network, and the network structural unit of the layer in the target neural network model is also added to the set of candidate structural units of the layer in the super network as one candidate structural unit of the layer. For example, for a network structure unit corresponding to a convolution operation of 3×3, a network structure unit corresponding to a convolution operation of 5×5 is constructed, and the network structure unit corresponding to the convolution operation of 3×3 and the constructed network structure unit corresponding to the convolution operation of 5×5 are used as candidate network structures of a corresponding layer in the super network.

Step 102, initializing the super network, and training the super network based on sample data of a preset domain and a candidate structure unit set corresponding to each layer of the super network.

Any two candidate structural units respectively positioned at two adjacent layers in the super network can be provided with a connection. By sampling the candidate structural units of each layer of the super network, the connection between adjacent layers can be sampled to obtain a complete neural network model. For example, sampling the candidate network structure a and the candidate network structure B in two adjacent layers X and Y respectively corresponds to sampling the connection between the candidate network structure a and the candidate network structure B.

After the super network is constructed, parameters corresponding to the respective connections of the super network may be initialized.

And then training the super network according to the acquired sample data of the preset domain. Specifically, multiple sub-networks can be used for sampling candidate structural units of each layer of the super-network, corresponding deep learning tasks are executed by using the sampled sub-networks, and parameters of the super-network are iteratively updated according to the performance of executing the corresponding deep learning tasks by different sub-networks until the update rate of the parameters of the super-network is lower than a threshold value or the accuracy measure of the super-network reaches a convergence condition.

Because the super network is generated based on the expansion of network structure units of each layer of the target neural network model, the super network comprises a plurality of sub-network structures, in the super network training process, a plurality of sub-networks comprising sub-networks consistent with the target neural network model structure are trained based on sample data of a preset domain at the same time, and the parameters of each sub-network have a mutually constrained relationship, the problem that a single target neural network model is over-fitted to the sample data based on a model generated when a small amount of sample data of the preset domain is trained can be solved, and the parameters of the sub-network consistent with the target neural network model structure in the super network obtained through training cannot be over-fitted to the small amount of sample data.

And step 103, synchronizing parameters of the target sub-network corresponding to the target neural network model in the trained super-network to the target neural network model.

After the super network is trained, parameters corresponding to each connection included in the target neural network model in the trained super network can be synchronized to the target neural network model, that is, the target neural network model is sampled from the super network, so as to obtain the trained target neural network model, that is, the neural network model generated according to the method of the embodiment.

According to the method for generating the neural network model, the super network is trained, and parameters corresponding to all the connections contained in the target neural network model in the super network are synchronized to the target neural network model, so that the problem of overfitting of the target neural network in training can be effectively solved, and the performance of the target neural network model is improved. Therefore, the method can train and obtain the target neural network model with higher performance and difficult over-fitting problem by using the sample data of the preset domain with fewer samples.

In some embodiments, the above-described supernetwork may be trained by iteratively performing a plurality of training operations. Referring to fig. 2, a flow chart of a training method of a super network is shown, wherein the training method of the super network includes performing a plurality of training operations in an iterative manner.

Specifically, the training operation may include:

step 201, a sub-network set of the current training operation is generated by sampling candidate structural units in the candidate structural unit set corresponding to each layer of the super-network.

A candidate structural unit can be randomly sampled from a candidate structural unit set corresponding to each layer of the super network, and the sampled candidate structural units of each layer are connected to form a sub network. The super-network may be sub-network sampled multiple times to form a sub-network set of current training operations.

Alternatively, a policy of sampling the candidate structural units corresponding to each layer of the super-network in an equalizing manner may be adopted to sample the sub-network set of the current training operation from the super-network. The balanced sampling strategy refers to the fact that the sampling probability or the sampling times of each candidate structural unit are equal for each layer of the super network. The balanced sampling may be used as a constraint for sub-network sampling for each training operation, and the sub-network set for the current training operation may be sampled from the super-network. In this way, the parameters corresponding to each connection of the super network can be equally trained in the training process of the super network, and inaccuracy of the super network obtained by training due to too few times of training of the parameters corresponding to some connections is avoided.

Step 202, training the sub-networks in the sub-network set based on the sample data of the preset domain, and determining the current performance of the super-network according to the training result of the sub-networks.

Each sub-network in the current set of sub-networks may be trained separately using sample data of a pre-set domain, e.g., image data acquired under specific imaging conditions (infrared light source, depth camera, etc.). In the training process of the sub-network, the performance of the sub-network is gradually improved through repeated iterative adjustment of parameters. After training of each sub-network is completed, the performance of each sub-network after training is completed can be tested and used as a training result of the sub-network. Then, the performance of each sub-network in the sub-network set can be comprehensively counted, and the counted result is used as the current performance of the super-network. For example, by weighted summation of performance indicators of the various sub-networks, or by averaging, etc.

In step 203, in response to determining that the current performance of the super network does not meet the preset convergence condition, the parameters of the super network are updated based on the current performance of the super network, and the next training operation is performed.

The preset convergence condition may be satisfied according to whether the current performance of the super network is satisfied. Here, the preset convergence condition may include at least one of: the current performance index of the super-network reaches a preset performance index threshold, the rate of increase of the current performance of the super-network compared to the performance in the previous training operation does not exceed a preset rate of increase threshold, the rate of increase of the performance of the super-network in successive preset training operations does not exceed a preset rate of increase threshold, and so on. Alternatively, in some embodiments, when the accumulated number of training operations performed on the super-network reaches a preset number threshold, it is determined that the current performance of the super-network does not satisfy a preset convergence condition.

If the current performance of the super network does not meet the preset convergence condition, forward feedback can be performed based on the current performance of the super network, parameters of the super network are to be updated by adopting a method such as back propagation, and the next training operation is performed based on the super network after the parameters are updated.

By training the super network by the method, parameters corresponding to each connection in the super network can be comprehensively optimized, so that the parameters of the target neural network model are gradually optimized in the super network process.

With continued reference to fig. 3, a flow diagram of another training method of the super network is shown. In this embodiment, the super network is trained by iteratively performing training operations.

As shown in fig. 3, the training operation includes:

step 301, generating a sub-network set of the current training operation by sampling candidate structural units in the candidate structural unit set corresponding to each layer of the super-network.

Step 302, training the sub-networks in the sub-network set based on the sample data of the preset domain, and determining the current performance of the super-network according to the training result of the sub-networks.

Step 301 and step 302 in this embodiment are consistent with step 201 and step 202 in the foregoing embodiments, and specific implementation manners of step 301 and step 302 may refer to descriptions of step 201 and step 202 in the foregoing embodiments, respectively, which are not repeated herein.

Step 303, determining the current gradient value of the training supervision function of the super network according to the training result of the sub network, and storing the current gradient value of the training supervision function of the super network into the gradient value set.

In this embodiment, the current gradient value of the training supervision function of the super-network may be calculated according to the training result of the sub-network, such as the performance index of the sub-network. The training supervision function of the super network is used for supervising the training of the super network and can be a pre-constructed function. As an example, the training supervision function of the super-network may perform an accumulation or weighted sum construction of prediction errors of the prediction tasks based on the sample data based on the respective sub-networks.

A gradient descent method may be employed to iteratively adjust parameters of the super network. Wherein the current gradient value of the training supervision function of the super network may be calculated and stored into a gradient value set, wherein the initial number of gradient values in the gradient value set is 0, and the preset number threshold may be an empirically set value, for example, 10, or 100, etc.

Step 304, in response to determining that the current performance of the super-network does not meet the preset convergence condition, and the number of gradient values in the gradient value set reaches the preset number threshold, calculating an average gradient value based on each gradient value in the gradient value set, updating parameters of the super-network based on the average gradient value, deleting each gradient value in the gradient value set, and executing the next training operation.

When the current performance of the super-network does not meet the preset convergence condition and the number of gradient values in the gradient value set reaches the preset number threshold, calculating an average value for each gradient value in the gradient value set, and feeding back the average gradient value forward to update the parameters of the super-network. In this way, parameters of the super-network may be updated based on gradients accumulated over multiple training operations. And deleting the gradient values in the gradient value set after calculating the average gradient value, so that the number of the gradient values in the gradient value set is reset to 0, and the number of the stored gradient values is accumulated again when the next training operation is executed, and the repeated iterative training of the super network in the training operation can be controlled through the operation of deleting the gradient values.

Because of the limited number of sub-networks sampled per training operation, the set of sub-networks sampled in a single training operation may not cover all connections and all candidate building blocks in the super-network. Therefore, the method and the device update the parameters of the super network by averaging the gradient values of the training supervision function of the super network in multiple training operations, so that the accuracy of the values of the training supervision function of the super network can be improved, the super network can be updated based on more candidate structural units and connections, and the accuracy of the super network is improved.

Optionally, the training operation may further include: and in response to determining that the current performance of the super-network does not meet the preset convergence condition and the number of gradient values in the gradient value set does not reach the preset number threshold, performing the next training operation.

When the current performance of the super-network does not meet the preset convergence condition, if the number of gradient values stored in the gradient value set does not reach the preset number threshold, the step 301 may be returned to execute the next training operation.

Optionally, the training process of the super network of fig. 2 and fig. 3 may further include:

and in response to determining that the current performance of the super network meets a preset convergence condition, determining the current super network as the trained super network. That is, training may be stopped upon convergence of the super network.

Optionally, in each training operation of the training process of the super network of fig. 3, the generating the sub-network set of the current training operation by sampling the candidate structural units in the candidate structural unit set corresponding to each layer of the super network includes:

and in response to determining that the difference between the number of gradient values in the gradient value set and the preset number threshold is 1 in the current training operation, and that the sub-network set sampled in each training operation respectively corresponding to the gradient values in the gradient value set does not contain the sub-network consistent with the target neural network model structure, the sub-network formed by the candidate structural units corresponding to each layer of the target neural network model structure is sampled from the super-network, and is added into the sub-network set of the current training operation.

Specifically, assuming that the preset number is N, if the number of gradient values stored in the gradient value set before the current training operation is N-1, and none of the sub-network sets sampled in the N-1 training operations before the current training operation includes the sub-network corresponding to the structure of the target neural network model, the sub-network corresponding to the structure of the target neural network model sampled in the current training operation is added to the sub-network set of the current training operation. Therefore, the structure of the target neural network model is ensured to be sampled at least once in each update of the super network parameters, namely, the parameters of the target neural network model are updated in each update of the super network parameters, and when the super network converges, the sub network corresponding to the target neural network model can also converge to a better effect. Optionally, if the number of gradient values stored in the gradient value set before the current training operation is N-1, and the sub-network set sampled in the N-1 training operations before the current training operation includes a sub-network corresponding to the structure of the target neural network model, the structure of the sub-network may be randomly sampled in the current training operation.

In some optional implementations of the foregoing method for generating a neural network model, the foregoing method may further include: and processing the data to be processed in the preset domain by using the target neural network model.

After the parameters of the super network are synchronized to the target neural network model, the data to be processed may be processed using the target neural network model. Because the super network reaches a better convergence state after training, the accuracy of directly synchronizing the parameters to the parameters of the target neural network model is higher, and the target neural network model is utilized to process the data to be processed in the preset domain, so that higher accuracy can be obtained.

In practice, the preset domain is, for example, a domain corresponding to the near infrared image, and the near infrared image to be processed is processed by using the target neural network model, so that a more accurate processing result can be obtained. Therefore, a relatively accurate target neural network model can be obtained after the super network is trained by using a relatively small amount of near infrared images, and a relatively accurate near infrared image processing result can be obtained.

Referring to fig. 4, as an implementation of the above method for generating a neural network model, the present disclosure provides an embodiment of an apparatus for generating a neural network model, where the embodiment of the apparatus corresponds to the embodiments of the above methods, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 4, the apparatus 400 for generating a neural network model of the present embodiment includes a construction unit 401, a training unit 402, and a synchronization unit 403. Wherein the construction unit 401 is configured to construct a super network based on the structure of the target neural network model, each layer of the super network includes a set of candidate structural units corresponding to each layer of the target neural network model, and the set of candidate structural units includes network structural units of a corresponding layer in the structure of the target neural network model and at least one candidate structural unit similar to the network structural units of a corresponding layer in the structure of the target neural network model; the training unit 402 is configured to initialize the super network, and train the super network based on sample data of a preset domain and a set of candidate structural units corresponding to each layer of the super network; the synchronization unit 403 is configured to synchronize parameters of a target sub-network corresponding to the target neural network model in the trained super network to the target neural network model.

In some embodiments, the training unit 402 is configured to train the super-network by iteratively performing a plurality of training operations. The training unit 402 includes: a sub-network sampling unit configured to perform the following steps in the training operation: sampling candidate structural units in the candidate structural unit sets corresponding to each layer of the super network to generate a sub-network set of the current training operation; a sub-network training unit configured to perform the following steps in the training operation: training the sub-networks in the sub-network set based on sample data of a preset domain, and determining the current performance of the super-network according to the training result of the sub-networks; a parameter updating unit configured to perform the following steps in the training operation: and in response to determining that the current performance of the super network does not meet the preset convergence condition, updating parameters of the super network based on the current performance of the super network, and executing the next training operation.

In some embodiments, the training unit 402 further includes: a gradient determination unit configured to perform the following steps in the training operation: determining a current gradient value of a training supervision function of the super network according to a training result of the sub network; and the parameter updating unit is further configured to update the parameters of the super network as follows: in response to determining that the number of gradient values in the gradient value set does not reach a preset number threshold, saving the current gradient value of the training supervision function of the super network into the gradient value set, wherein the initial number of gradient values in the gradient value set is 0; in response to determining that the number of gradient values in the gradient value set reaches a preset number threshold, calculating an average gradient value based on each gradient value in the gradient value set, updating parameters of the super network based on the average gradient value, and deleting each gradient value in the gradient value set.

In some embodiments, the above-mentioned sub-network sampling unit is further configured to sample candidate structural units in the candidate structural unit sets corresponding to the layers of the super-network in the following manner, to generate a sub-network set of the current training operation: and in response to determining that the difference between the number of gradient values in the gradient value set and the preset number threshold is 1 in the current training operation, and that the sub-network set sampled in each training operation respectively corresponding to the gradient values in the gradient value set does not contain the sub-network consistent with the target neural network model structure, the sub-network formed by the candidate structural units corresponding to each layer of the target neural network model structure is sampled from the super-network, and is added into the sub-network set of the current training operation.

In some embodiments, the above-mentioned sub-network sampling unit is further configured to sample candidate structural units in the candidate structural unit sets corresponding to the layers of the super-network in the following manner, to generate a sub-network set of the current training operation: and sampling the sub-network set of the current training operation from the super-network by adopting a strategy of uniformly sampling the candidate structural units corresponding to each layer of the super-network.

In some embodiments, the apparatus further comprises: and the processing unit is configured to process the data to be processed in the preset domain by utilizing the target neural network model.

The above-described apparatus 400 corresponds to the steps in the method embodiments described above. Thus, the operations, features and technical effects that can be achieved by the method for generating a neural network model described above are equally applicable to the apparatus 400 and the units contained therein, and are not described herein.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 5, is a block diagram of an electronic device of a method of generating a neural network model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of generating a neural network model provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of generating a neural network model provided by the present application.

The memory 502 is used as a non-transitory computer readable storage medium for storing a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/units/modules (e.g., the building unit 401, the training unit 402, and the synchronization unit 403 shown in fig. 4) corresponding to the method for generating a neural network model in the embodiment of the present application. The processor 501 executes various functional applications of the server and data processing, that is, implements the method of generating a neural network model in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 502.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the electronic device for generating the structure of the neural network, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected via a network to an electronic device used to generate the architecture of the neural network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of generating a neural network model may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus 505 or otherwise, in fig. 5 by way of example by bus 505.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device used to generate the neural network structure, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output means Y04 may include a display device, an auxiliary lighting means (e.g., LED), a haptic feedback means (e.g., vibration motor), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the application referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which features described above or their equivalents may be combined in any way without departing from the spirit of the application. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A method of generating a neural network model, comprising:

constructing a super network based on the structure of a target neural network model, wherein each layer of the super network comprises a candidate structure unit set corresponding to each layer of the target neural network model respectively, and the candidate structure unit set comprises network structure units of corresponding layers in the structure of the target neural network model and at least one candidate structure unit similar to the network structure units of corresponding layers in the structure of the target neural network model, wherein the target neural network model is used for executing image processing, text translation and/or voice recognition tasks of a preset domain, and the preset corresponds to the data type or the data generation mode of the data used for processing by the target neural network model or corresponds to the application scene of the target neural network model;

initializing the super network, and training the super network based on sample data of a preset domain and a candidate structure unit set corresponding to each layer of the super network;

and synchronizing parameters of a target sub-network corresponding to the target neural network model in the trained super-network to the target neural network model.

2. The method of claim 1, wherein the training the super network based on the sample data of the preset domain and the set of candidate structural units corresponding to each layer of the super network comprises iteratively performing a plurality of training operations;

the training operation includes:

sampling candidate structural units in the candidate structural unit sets corresponding to each layer of the super network to generate a sub-network set of the current training operation;

training the sub-networks in the sub-network set based on the sample data of the preset domain, and determining the current performance of the super-network according to the training result of the sub-networks;

and in response to determining that the current performance of the super network does not meet a preset convergence condition, updating parameters of the super network based on the current performance of the super network, and executing the next training operation.

3. The method of claim 2, wherein the training operation further comprises:

determining a current gradient value of the training supervision function of the super network according to the training result of the sub network, and storing the current gradient value of the training supervision function of the super network into a gradient value set, wherein the initial number of the gradient values in the gradient value set is 0;

The updating the parameters of the super network based on the current performance of the super network comprises:

and in response to determining that the number of gradient values in the gradient value set reaches a preset number threshold, calculating an average gradient value based on each gradient value in the gradient value set, updating parameters of the super network based on the average gradient value, and deleting each gradient value in the gradient value set.

4. A method according to claim 3, wherein the generating the sub-network set of the current training operation by sampling candidate building units in the candidate building unit set corresponding to each layer of the super-network comprises:

and in response to determining that the difference between the number of gradient values in the gradient value set and the preset number threshold is 1 in the current training operation, and that the sub-network set sampled in each training operation respectively corresponding to the gradient values in the gradient value set does not contain the sub-network consistent with the target neural network model structure, sampling the sub-network formed by the candidate structural units corresponding to each layer of the target neural network model structure from the super-network, and adding the sub-network formed by the candidate structural units into the sub-network set of the current training operation.

5. The method according to any one of claims 2-4, wherein the generating the sub-network set of the current training operation by sampling candidate building units in the candidate building unit set corresponding to each layer of the super-network comprises:

and sampling a sub-network set of the current training operation from the super-network by adopting a strategy of uniformly sampling candidate structural units corresponding to each layer of the super-network.

6. The method of claim 1, wherein the method further comprises:

and processing the data to be processed in the preset domain by using the target neural network model.

7. An apparatus for generating a neural network model, comprising:

a construction unit configured to construct a super network based on a structure of a target neural network model, wherein each layer of the super network comprises a candidate structure unit set corresponding to each layer of the target neural network model, and the candidate structure unit set comprises network structure units of corresponding layers in the structure of the target neural network model and at least one candidate structure unit similar to the network structure units of corresponding layers in the structure of the target neural network model, wherein the target neural network model is used for executing image processing, text translation and/or voice recognition tasks of a preset domain, and the preset corresponds to a data type or a data generation mode of data used for processing by the target neural network model or corresponds to an application scene of the target neural network model;

The training unit is configured to initialize the super network and train the super network based on sample data of a preset domain and a candidate structure unit set corresponding to each layer of the super network;

and the synchronization unit is configured to synchronize parameters of a target sub-network corresponding to the target neural network model in the trained super-network to the target neural network model.

8. The apparatus of claim 7, wherein the training unit is configured to train the super network by iteratively performing a plurality of training operations;

the training unit includes:

a sub-network sampling unit configured to perform the following steps in the training operation: sampling candidate structural units in the candidate structural unit sets corresponding to each layer of the super network to generate a sub-network set of the current training operation;

a sub-network training unit configured to perform the following steps in the training operation: training the sub-networks in the sub-network set based on the sample data of the preset domain, and determining the current performance of the super-network according to the training result of the sub-networks;

a parameter updating unit configured to perform the following steps in the training operation: and in response to determining that the current performance of the super network does not meet a preset convergence condition, updating parameters of the super network based on the current performance of the super network, and executing the next training operation.

9. The apparatus of claim 8, wherein the training unit further comprises:

a gradient determination unit configured to perform the following steps in the training operation: determining a current gradient value of the training supervision function of the super network according to the training result of the sub network, and storing the current gradient value of the training supervision function of the super network into a gradient value set, wherein the initial number of the gradient values in the gradient value set is 0; and

the parameter updating unit is further configured to update parameters of the super network as follows:

10. The apparatus of claim 9, wherein the sub-network sampling unit is further configured to sample candidate building units in the set of candidate building units corresponding to each layer of the super-network, to generate the sub-network set of current training operations, in the following manner:

11. The apparatus according to any of claims 8-10, wherein the sub-network sampling unit is further configured to sample candidate building units in the set of candidate building units corresponding to each layer of the super-network, to generate the sub-network set of current training operations, in the following manner:

12. The apparatus of claim 7, wherein the apparatus further comprises:

and the processing unit is configured to process the data to be processed in the preset domain by utilizing the target neural network model.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.