CN110766142A

CN110766142A - Model generation method and device

Info

Publication number: CN110766142A
Application number: CN201911045657.9A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-07

Abstract

The present disclosure relates to the field of artificial intelligence technology. The embodiment of the disclosure provides a model generation method and a model generation device. The method comprises the following steps: acquiring a first neural network for executing a deep learning task; searching out a second neural network by executing a plurality of iterative operations; the iterative operation comprises: updating a preset model structure controller based on the current feedback reward value, and generating a candidate neural network by adopting the updated model structure controller; distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network; updating the reward feedback value based on a distillation loss function of the distilled candidate neural network; and in response to the fact that the reward feedback value reaches a preset convergence condition or the accumulated times of the iterative operation reach a preset time threshold value, determining the distilled candidate neural network obtained in the current iterative operation as a searched second neural network. The method can automatically search out the neural network model structure suitable for distillation.

Description

Model generation method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence, and particularly relates to a model generation method and device.

Background

With the development of artificial intelligence technology, deep learning has achieved good results in many application fields. In deep learning, the structure of the neural network has a very important influence on the effect of the model. In practice, in order to obtain higher performance, the structural complexity of the neural network is higher, and more computing resources need to be consumed for operating the neural network. And the manual design of the network structure requires very rich experience and multiple attempts, and is high in cost.

Model distillation is a means to monitor the training of small networks using large models so that the small networks achieve the performance of large networks. Model distillation can effectively reduce resource consumption of model operation, and the existing method is to distill a small network designed manually through a large model. However, the model structure of the artificially designed small network may not be suitable for distillation, or the structure of the artificially designed small network and the large network may not match, resulting in an undesirable distillation effect.

Disclosure of Invention

Embodiments of the present disclosure provide a model generation method and apparatus, an electronic device, and a computer-readable medium.

In a first aspect, an embodiment of the present disclosure provides a model generation method, including: acquiring a first neural network for executing a deep learning task; searching out a second neural network for executing a deep learning task by executing a plurality of iterative operations; wherein the iterative operation comprises: updating a preset model structure controller based on the current feedback reward value, and generating a candidate neural network by using the updated model structure controller, wherein the structural complexity of the candidate neural network is lower than that of the first neural network, and the initial value of the feedback reward value is a preset value; distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network; updating the reward feedback value based on a distillation loss function of the distilled candidate neural network; and in response to the fact that the reward feedback value reaches a preset convergence condition or the accumulated times of the iterative operation reach a preset time threshold value, determining the distilled candidate neural network obtained in the current iterative operation as a searched second neural network.

In some embodiments, the generating the candidate neural network using the updated model structure controller includes: and determining variable structure parameters of the network structure from a preset network structure search space by adopting the updated model structure controller, and generating a coding sequence for representing the structure of the candidate neural network.

In some embodiments, the model structure controller comprises a recurrent neural network; and the updating of the preset model structure controller based on the current feedback reward value comprises the following steps: and updating the parameters of the recurrent neural network by adopting a back propagation algorithm based on the current reward feedback value so that the updated model structure controller generates a candidate neural network which increases the reward feedback value.

In some embodiments, the updating the preset model structure controller based on the current feedback reward value includes: generating a plurality of neural network models as a population based on a preset network structure search space, and taking a current reward feedback value as the fitness of a candidate neural network generated in the current iteration operation in the population; and updating the model constructor based on the fitness of the candidate neural network in the population so that the model constructor evolves the population through a genetic algorithm in the next iteration operation to generate a neural network model with the increased fitness as the candidate neural network of the next iteration operation.

In some embodiments, the distillation loss function comprises: a first loss function characterizing differences between features respectively extracted by the first neural network and the candidate neural network; or the distillation loss function comprises: a first loss function and a second loss function characterizing a difference between results of execution of the deep learning task by the first neural network and the candidate neural network, respectively.

In a second aspect, an embodiment of the present disclosure provides a model generation apparatus, including: an acquisition unit configured to acquire a first neural network for performing a deep learning task; a search unit configured to search out a second neural network for performing a deep learning task by performing a plurality of iterative operations; wherein the iterative operation comprises:

updating a preset model structure controller based on the current feedback reward value, and generating a candidate neural network by using the updated model structure controller, wherein the structural complexity of the candidate neural network is lower than that of the first neural network, and the initial value of the feedback reward value is a preset value; distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network; updating the reward feedback value based on a distillation loss function of the distilled candidate neural network; and in response to the fact that the reward feedback value reaches a preset convergence condition or the accumulated times of the iterative operation reach a preset time threshold value, determining the distilled candidate neural network obtained in the current iterative operation as a searched second neural network.

In some embodiments, the iterative operation performed by the search unit, wherein the generating the candidate neural network by using the updated model structure controller, includes: and determining variable structure parameters of the network structure from a preset network structure search space by adopting the updated model structure controller, and generating a coding sequence for representing the structure of the candidate neural network.

In some embodiments, the model structure controller comprises a recurrent neural network; and in the iterative operation executed by the search unit, updating the preset model structure controller based on the current feedback reward value, wherein the updating comprises the following steps: and updating the parameters of the recurrent neural network by adopting a back propagation algorithm based on the current reward feedback value so that the updated model structure controller generates a candidate neural network which increases the reward feedback value.

In some embodiments, the above iterative operation performed by the search unit, updating the preset model structure controller based on the current feedback reward value, includes: generating a plurality of neural network models as a population based on a preset network structure search space, and taking a current reward feedback value as the fitness of a candidate neural network generated in the current iteration operation in the population; and updating the model constructor based on the fitness of the candidate neural network in the population so that the model constructor evolves the population through a genetic algorithm in the next iteration operation to generate a neural network model with the increased fitness as the candidate neural network of the next iteration operation.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the model generation method as provided in the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the model generation method provided in the first aspect.

According to the model generation method and the model generation device, a first neural network used for executing a deep learning task is obtained; then, searching out a second neural network for executing the deep learning task by executing a plurality of times of iterative operations; wherein the iterative operation comprises:

updating a preset model structure controller based on the current feedback reward value, and generating a candidate neural network by using the updated model structure controller, wherein the structural complexity of the candidate neural network is lower than that of the first neural network, and the initial value of the feedback reward value is a preset value; distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network; updating the reward feedback value based on a distillation loss function of the distilled candidate neural network; and in response to the fact that the reward feedback value reaches a preset convergence condition or the accumulated times of the iteration operation reach a preset threshold value, determining the distilled candidate neural network obtained in the current iteration operation as a searched second neural network. The model generation method and the model generation device can automatically search out the structure of the small model suitable for distillation, reduce the consumption of computing resources and improve the distillation effect of the model.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of a model generation method according to the present disclosure;

FIG. 3 is a schematic diagram of one implementation of a model generation method according to the present disclosure;

FIG. 4 is a schematic structural diagram of one embodiment of a model generation apparatus of the present disclosure;

FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates an exemplary system architecture 100 to which the model generation method or model generation apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include, as shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The end devices 101, 102, 103 may be customer premises devices on which various client applications may be installed. Such as image processing-type applications, information analysis-type applications, voice assistant-type applications, shopping-type applications, financial-type applications, and the like.

The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that runs various services, such as a server that runs a neural network structure search task, and further such as a server that runs a model distillation task. The server 105 may construct training samples by obtaining deep learning task data collected from the terminal devices 101, 102, 103 or obtaining deep learning task data from a database, and automatically search and optimize a model structure of a neural network for performing a deep learning task.

The server 105 may also be a backend server providing backend support for applications installed on the terminal devices 101, 102, 103. For example, the server 105 may receive information to be processed sent by the terminal devices 101, 102, 103, process the information using the neural network model, and return the processing results to the terminal devices 101, 102, 103.

In a real scenario, the terminal devices 101, 102, 103 may send a deep learning task request related to speech recognition, text classification, dialogue behavior classification, image recognition, etc. tasks to the server 105. A neural network model, which has been trained for a corresponding deep learning task, may be run on the server 105, with which information is processed.

It should be noted that the model generation method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the model generation apparatus is generally provided in the server 105.

In some scenarios, server 105 may retrieve source data (e.g., training samples, non-optimized neural networks, etc.) required for model generation from a database, memory, or other device, in which case exemplary system architecture 100 may be absent of terminal devices 101, 102, 103 and network 104.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a model generation method according to the present disclosure is shown. The model generation method comprises the following steps:

in step 201, a first neural network for performing a deep learning task is obtained.

The deep learning task may be an information processing task that is accomplished using a deep neural network. In practice, the deep learning task may be, for example: speech recognition, speech synthesis, text translation, natural language understanding, image processing, trend prediction, target detection and tracking, and the like.

In this embodiment, a first neural network that has been trained to complete for a particular deep learning task may be obtained. The first neural network can be obtained by iterative training based on a large amount of sample data by adopting a gradient descent and back propagation method, and has a relatively complex structure. The first neural network may be stored at a designated location after training is completed. In this embodiment, an execution subject of the model generation method (for example, a server shown in fig. 1) may acquire the first neural network from the specified location.

In step 202, a second neural network for executing the deep learning task is searched out by executing a plurality of iterative operations.

In this embodiment, a model distillation method may be used to distill the second neural network that has a simple structure and a performance similar to the first neural network. Here, the structural complexity of the first neural network is higher than that of the second neural network. Specifically, the second neural network can be continuously iteratively adjusted in the distillation process, so that the execution result of the deep learning task approaches the execution result of the deep learning task by the first neural network.

Specifically, the structure of the second neural network can be searched out within the pre-constructed search space by sequentially performing a plurality of iterative operations.

The iterative operation includes step 2021, step 2022, step 2023, and step 2024.

First, in step 2021, a preset model structure controller is updated based on the current feedback reward value, and a candidate neural network is generated by using the updated model structure controller.

A reinforcement learning method may be used to guide the model structure controller to update with a feedback reward value (reward) that characterizes the performance of the model structure controller. Here, the initial value of the feedback bonus value may be a preset value, for example, may be set to 0 in advance. The model structure controller may be updated with the initial value of the feedback prize value as the prize feedback value for the current iteration operation when the first iteration operation is performed. In the non-first iteration, the feedback reward value updated after the last iteration can be used as the feedback reward value in the current iteration.

The model structure controller may be used to control or generate a neural network model structure, which may be embodied as a machine learning algorithm, such as a recurrent neural network, a genetic algorithm, and so forth. The model structure controller may perform parameter selection and combination on model structure units in a preset search space to generate candidate neural networks.

Here, the search space is a search space of a structure of the second neural network, which may include model structural units such as neural network layer structures of various convolutional layers, pooling layers, and the like, or structural units formed by combining at least two neural network layers having specific structural parameters.

The structural complexity of the candidate neural network generated by the model structure controller is lower than that of the first neural network described above. In this embodiment, the model structure controller may be trained in advance to generate a neural network with a simpler structure, or a constraint condition that the structural complexity of the generated neural network does not exceed a preset complexity may be added in the design of the model structure controller, or after the model structure controller generates the neural network, the neural network with the structural complexity not exceeding the preset complexity may be preliminarily screened out as a candidate neural network.

In some alternative implementations of the present embodiment, the model structure controller may include a recurrent neural network. At this time, the preset model structure controller may be updated as follows: and updating the parameters of the recurrent neural network by adopting a back propagation algorithm based on the current reward feedback value so that the updated model structure controller generates a candidate neural network which increases the reward feedback value. The reward feedback value may be calculated based on the distillation loss function of the candidate neural network in the last iteration. The reward feedback value can be reversely propagated, the parameters of the recurrent neural network are adjusted by adopting a gradient descent method, the value of the distillation loss function of the new candidate neural network generated by the recurrent neural network after the parameters are adjusted is reduced, and the corresponding reward feedback value is increased.

In other alternative implementations of the present embodiment, the model structure controller may employ a genetic algorithm. At this time, the preset model structure controller may be updated as follows: firstly, generating a plurality of neural network models as a population based on a preset network structure search space, and taking a current reward feedback value as the fitness of a candidate neural network generated in the current iteration operation in the population; then, the model constructor is updated based on the fitness of the candidate neural network in the population, so that the model constructor evolves the population through a genetic algorithm in the next iteration operation to generate a neural network model with the increased fitness as the candidate neural network of the next iteration operation.

The method comprises the steps of generating a plurality of neural network models as a population according to a certain rule combination for structural units and/or parameters in a network structure search space, and guiding population evolution by using a reward feedback value of each candidate neural network after distillation as the fitness of each candidate neural network, wherein the population evolution can comprise the steps of adjusting the structure of the original neural network model in the population according to the fitness or selecting the neural network model with higher fitness in the population.

In some optional implementations of the present embodiment, the step of generating the candidate neural network using the updated model structure controller may be performed as follows: and determining variable structure parameters of the network structure from a preset network structure search space by adopting the updated model structure controller, and generating a coding sequence for representing the structure of the candidate neural network.

Here, the variable structure parameter characterizes a structural parameter in the neural network that can be designed, selected, or combined to change the structure of the neural network. The variable structure parameters may include the number of layers, the size of neurons per layer, parameters characterizing the connection relationships between neurons, etc., and may include, for example, convolution kernels, the size of the convolution kernels, the number of convolution/pooling/fully-connected layers, convolution kernel expansion coefficients, etc.

A model structure controller may be employed to encode the structure of the candidate neural networks and to serialize the respective candidate neural networks, each of the resulting sequences representing a candidate neural network. Therefore, different candidate neural networks can be effectively distinguished, and the representation mode of the serialized candidate neural networks is beneficial to improving the training efficiency in a distributed training scene.

It should be noted that the model structure controller may generate a plurality of candidate neural networks in one iteration, for example, when the recurrent neural network is adopted as the model structure controller, each element in the output sequence of the recurrent neural network characterizes one candidate neural network.

Step 2022, distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network.

The first neural network may be used as a teacher network and each candidate neural network may be used as a student network, and the candidate neural networks may be distilled separately. In the distillation process, parameters of the candidate neural network can be iteratively adjusted based on the distillation loss function, so that the intermediate processing result and the final processing result of the candidate neural network on the input data are respectively consistent with the intermediate processing result and the final processing result of the first neural network on the input data.

Here, the distillation loss function may characterize the training objectives of the student network. The training targets for the student network may include soft targets (soft targets). Soft objectives refer to the difference between the output of the middle layer of the student network and the teacher network. For example, in the classification task, the soft object represents the difference between the class probability obtained by the student network and the class probability obtained by the teacher network, and can be represented by the cross entropy of the two.

Alternatively, the distillation loss function may include a first loss function characterizing a difference between the features extracted by the first neural network and the distilled candidate neural network, respectively. In particular, the first loss function may be constructed based on a difference between features extracted by the first neural network and a last feature extraction layer (e.g., a last convolutional layer or a last pooling layer) of the candidate neural network. Alternatively, the output of the last fully-connected layer of the first neural network and the output of the last fully-connected layer of the candidate neural network may be subjected to nonlinear processing using the same nonlinear function (e.g., softmax, sigmoid, etc.), and then the difference therebetween (e.g., calculating the L2 norm characterizing the difference therebetween) may be calculated as the first loss function.

Here, the first loss function is not a loss function of the deep learning task itself, and represents a loss of the output of the intermediate layer of the neural network, and since the output of the intermediate layer has not been subjected to unique (one-hot) processing, the first loss function can provide more information and a more accurate training target than the loss function of the deep learning task itself.

Alternatively, the distillation loss function may include the above-described first loss function representing a difference between the features extracted by the first neural network and the candidate neural network, respectively, and the second loss function representing a difference between the results of the execution of the deep learning task by the first neural network and the candidate neural network, respectively. The second loss function may characterize the loss function of the candidate neural network to the deep learning task itself. The first loss function and the second loss function may be weighted and summed as a distillation loss function.

And continuously iterating the candidate neural network in the distillation process, and stopping iteration when the value of the distillation loss function is converged in a certain range or the accumulated iteration times of the candidate neural network in the distillation process reaches the preset maximum traumatic times to obtain the distilled candidate neural network.

At step 2023, the reward feedback value is updated based on the distillation loss function of the distilled candidate neural network.

The reward feedback value may be inversely related to the value of the distillation loss function of the distilled candidate neural network, for example, the inverse of the value of the distillation loss function of the distilled candidate neural network may be taken as the new reward feedback value. That is, the smaller the distillation loss of the distilled candidate neural network, the larger the value of the reward feedback. In this way, after the model structure controller is guided to be updated by the reward feedback value, the updated model structure controller can be enabled to generate a candidate neural network capable of achieving smaller distillation loss.

Optionally, the reward feedback value may be further updated based on statistical data such as an average or an accumulated value of values of distillation loss functions of a plurality of candidate neural networks in the current iteration.

Step 2024, in response to determining that the reward feedback value reaches the preset convergence condition or the cumulative number of iterations reaches the preset threshold, determining the distilled candidate neural network obtained in the current iteration as the searched second neural network.

After updating the reward feedback value, it may be determined whether the reward feedback value reaches a predetermined convergence condition, for example, whether a change rate of the reward feedback value in the last consecutive iterations is lower than a predetermined change rate threshold, and if so, the iterations may be stopped. And taking the distilled candidate neural network obtained by the current iteration operation as a searched second neural network.

After the current iteration operation is completed, 1 may be added to the accumulated iteration operation number, and then it is determined whether the accumulated iteration operation number reaches a preset number threshold, and if so, the iteration operation may be stopped. And taking the distilled candidate neural network obtained by the current iteration operation as a searched second neural network.

If the reward feedback value does not reach the preset convergence condition and the accumulated times of the iterative operations do not reach the preset threshold value, the next iterative operation is executed based on the updated reward feedback value, the step 2021 of re-determining the candidate neural network is sequentially executed, the step 2022 of distilling the new candidate neural network is sequentially executed, the step 2023 of continuously updating the reward feedback value based on the new candidate neural network is executed, and the step 2024 of judging whether the iteration stop condition is reached. In this way, the iteration operation is repeatedly executed until the reward feedback value after a certain iteration operation reaches the preset convergence condition or the accumulated times of the iteration operation reaches the preset time threshold value, and the iteration operation is stopped being executed, so that the searched second neural network is obtained.

The second neural network and the first neural network are used for executing the same deep learning task, have performance similar to that of the first neural network, but have lower structural complexity than that of the first neural network, so that the computation amount can be reduced and the computing resources can be saved when the second neural network is used for executing the deep learning task. The method for generating the model can guide the network structure search based on the distillation effect of the neural network, so that the small network structure suitable for distillation is automatically searched by the method for generating the model, and the distillation effect of the model is improved.

With continued reference to FIG. 3, a schematic diagram of one implementation of a model generation method according to the present disclosure is shown.

As shown in fig. 3, in one iteration, after the model structure controller determines candidate neural networks s1, s2, s3, …, sN in the search space, the candidate neural networks s1, s2, s3, …, sN are distilled to obtain distillation losses of the candidate neural networks s1, s2, s3, …, sN, a reward feedback value reward is obtained from the distillation losses, then the reward feedback value reward is fed back to the model structure controller, and the next iteration is started, the model structure controller determines new candidate neural networks s1, s2, s3, …, sN, distills the new candidate neural networks s1, s2, s3, …, sN, determines distillation losses, and feeds back to the model structure controller after the reward values are obtained from the distillation losses. This is repeated for a number of iterations to search for the best candidate model for distillation.

With further reference to fig. 4, as an implementation of the above model generation method, the present disclosure provides an embodiment of a model generation apparatus, which corresponds to the method embodiment shown in fig. 2, and which may be applied in various electronic devices.

As shown in fig. 4, the model generation apparatus 400 of the present embodiment includes: an acquisition unit 401 and a search unit 402. Wherein the obtaining unit 401 is configured to obtain a first neural network for performing a deep learning task; the search unit 402 is configured to search out a second neural network for performing a deep learning task by performing a plurality of iterative operations; wherein the iterative operation comprises: updating a preset model structure controller based on the current feedback reward value, and generating a candidate neural network by using the updated model structure controller, wherein the structural complexity of the candidate neural network is lower than that of the first neural network, and the initial value of the feedback reward value is a preset value; distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network; updating the reward feedback value based on a distillation loss function of the distilled candidate neural network; and in response to the fact that the reward feedback value reaches a preset convergence condition or the accumulated times of the iterative operation reach a preset time threshold value, determining the distilled candidate neural network obtained in the current iterative operation as a searched second neural network.

The acquisition unit 401 and the search unit 402 in the apparatus 400 described above correspond to step 201 and step 202 (including step 2021 to step 2024) in the method described with reference to fig. 2, respectively. Thus, the operations, features and technical effects described above for the model generation method are also applicable to the apparatus 400 and the units included therein, and are not described herein again.

Referring now to FIG. 5, a schematic diagram of an electronic device (e.g., the server shown in FIG. 1) 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; a storage device 508 including, for example, a hard disk; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 5 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a first neural network for executing a deep learning task; searching out a second neural network for executing a deep learning task by executing a plurality of iterative operations; wherein the iterative operation comprises: updating a preset model structure controller based on the current feedback reward value, and generating a candidate neural network by using the updated model structure controller, wherein the structural complexity of the candidate neural network is lower than that of the first neural network, and the initial value of the feedback reward value is a preset value; distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network; updating the reward feedback value based on a distillation loss function of the distilled candidate neural network; and in response to the fact that the reward feedback value reaches a preset convergence condition or the accumulated times of the iterative operation reach a preset time threshold value, determining the distilled candidate neural network obtained in the current iterative operation as a searched second neural network.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a search unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a first neural network for performing a deep learning task".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A model generation method, comprising:

acquiring a first neural network for executing a deep learning task;

searching out a second neural network for executing the deep learning task by executing a plurality of iterative operations;

wherein the iterative operation comprises:

updating a preset model structure controller based on a current feedback reward value, and generating a candidate neural network by using the updated model structure controller, wherein the structure complexity of the candidate neural network is lower than that of the first neural network, and the initial value of the feedback reward value is a preset numerical value;

distilling the candidate neural network based on the first neural network, and determining a distillation loss function of the distilled candidate neural network;

updating the reward feedback value based on a distillation loss function of the distilled candidate neural network;

and in response to the fact that the reward feedback value reaches a preset convergence condition or the accumulated times of the iterative operation reach a preset time threshold value, determining the distilled candidate neural network obtained in the current iterative operation as the searched second neural network.

2. The method of claim 1, the generating a candidate neural network with the updated model structure controller, comprising:

and determining variable structure parameters of the network structure from a preset network structure search space by adopting the updated model structure controller, and generating a coding sequence for representing the structure of the candidate neural network.

3. The method of claim 1 or 2, wherein the model structure controller comprises a recurrent neural network; and

the updating of the preset model structure controller based on the current feedback reward value comprises:

updating the parameters of the recurrent neural network by adopting a back propagation algorithm based on the current reward feedback value so that the updated model structure controller generates a candidate neural network which increases the reward feedback value.

4. The method of claim 1 or 2, wherein updating the pre-set model structure controller based on the current feedback reward value comprises:

generating a plurality of neural network models as a population based on a preset network structure search space, and taking the current reward feedback value as the fitness of a candidate neural network generated in the current iteration operation in the population;

and updating the model constructor based on the fitness of the candidate neural network in the population so that the model constructor evolves the population through a genetic algorithm in the next iteration operation to generate a neural network model with the increased fitness as the candidate neural network of the next iteration operation.

5. The method of claim 1, wherein the distillation loss function comprises: a first loss function characterizing differences between features extracted by the first neural network and the candidate neural network, respectively; or

The distillation loss function comprises:

the first loss function and a second loss function characterizing a difference between results of execution of the deep learning task by the first neural network and the candidate neural network, respectively.

6. A model generation apparatus comprising:

an acquisition unit configured to acquire a first neural network for performing a deep learning task;

a search unit configured to search out a second neural network for performing the deep learning task by performing a plurality of iterative operations;

wherein the iterative operation comprises:

7. The apparatus of claim 6, wherein the search unit performs an iterative operation in which the generating candidate neural networks using the updated model structure controller comprises:

8. The apparatus of claim 6 or 7, wherein the model structure controller comprises a recurrent neural network; and

in an iterative operation performed by the search unit, updating a preset model structure controller based on the current feedback reward value includes:

9. The apparatus according to claim 6 or 7, wherein the search unit performs an iterative operation in which the updating of the preset model structure controller based on the current feedback reward value includes:

10. The apparatus of claim 6, wherein the distillation loss function comprises: a first loss function characterizing differences between features extracted by the first neural network and the candidate neural network, respectively; or

The distillation loss function comprises:

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.