CN111652354B

CN111652354B - Method, apparatus, device and storage medium for training super network

Info

Publication number: CN111652354B
Application number: CN202010479963.XA
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2023-10-24
Anticipated expiration: 2040-05-29
Also published as: CN111652354A

Abstract

The application discloses a method, a device, equipment and a storage medium for training a super network, which relate to the technical field of artificial intelligence and further relate to the technical field of deep learning. The specific implementation scheme is as follows: determining a plurality of sub-network sets according to a pre-established search space of the super-network; the following iterative operations are performed a plurality of times: selecting a plurality of sub-networks from the sub-network set; updating the sub-network and the super-network, determining comparison characteristics from the characteristics extracted from the updated sub-network, and determining comparison super-network from the updated super-network; updating the comparison super network based on the extracted features and the comparison features; and updating the super network according to the updated comparison super network. The super network obtained by training in the implementation mode has higher precision, and the sub network sampled from the super network has consistent performance with the independently trained network. Therefore, when the super network of the implementation mode is applied to the field of image processing, the sub network with excellent performance can be quickly searched out based on the NAS.

Description

Method, apparatus, device and storage medium for training super network

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, further relates to the technical field of deep learning, and in particular relates to a method, a device, equipment and a storage medium for training a super network.

Background

Deep neural networks have achieved significant results in many areas. The structure of the deep neural network model has a direct impact on its performance. The structure of the traditional neural network model is designed according to experience by an expert, rich expert knowledge is needed, and the design cost of the network structure is high.

NAS (Neural Architecture Search, network structure automatic search) is to replace complicated manual operation with algorithm to automatically search out the optimal neural network architecture. In one current approach, the super-network is trained by pre-building the super-network containing all possible model structures. Then in the actual deep learning task, a proper sub-network is searched out from the super-network through the NAS and used as a neural network model for executing the deep learning task.

However, since all network structures in the super network coexist, there is a problem of mutual exclusion in training the super network. In order to give consideration to the better performance of all network structures, the super-network training process can lead to a larger gap between the performance of the network structures and the performance of the independently trained network.

Disclosure of Invention

A method, apparatus, device, and storage medium for training a super network are provided.

According to a first aspect, there is provided a method for training a super network, comprising: determining a plurality of sub-network sets according to a pre-established search space of the super-network, wherein the sub-networks in the sub-network sets meet an orthogonal relationship; based on the plurality of sub-network sets and the super-network, performing the following iterative operations a plurality of times: selecting a plurality of sub-networks from at least one sub-network set; updating the plurality of sub-networks and the super-network, determining comparison characteristics from the characteristics extracted from the updated plurality of sub-networks, and determining the comparison super-network from the updated super-network; updating the comparison super network based on the extracted characteristics of the plurality of sub networks and the comparison characteristics; and updating the super network according to the updated comparison super network.

According to a second aspect, there is provided an apparatus for training a super network, comprising: a determining unit configured to determine a plurality of sub-network sets according to a search space of a pre-established super-network, wherein the sub-networks in the sub-network sets satisfy an orthogonal relationship; an iteration unit configured to perform an iteration operation based on the plurality of sub-network sets and the super-network, a plurality of times by: a selection module configured to select a plurality of sub-networks from at least one set of sub-networks; the comparison module is configured to update the plurality of sub-networks and the super-network, determine comparison characteristics from the characteristics extracted from the updated plurality of sub-networks, and determine a comparison super-network from the updated super-network; a first updating module configured to update the comparative super network based on the features extracted by the plurality of sub-networks and the comparative features; and the second updating module is configured to update the super network according to the updated comparison super network.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described in the first aspect.

The super network obtained by training according to the technology of the application has higher precision, and the sub network sampled from the super network obtained by training has the same performance as the network with the same structure by independent training. Therefore, when the super network obtained by training the implementation mode is applied to the field of image processing, the sub network with excellent performance can be quickly searched out based on the super network automatic search model structure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for training a super network according to the present application;

FIG. 3 is a flow chart of another embodiment of a method for training a super network according to the present application;

FIG. 4 is a schematic diagram of an embodiment of an apparatus for training a super network in accordance with the present application;

fig. 5 is a block diagram of an electronic device for implementing a method for training a super network in accordance with an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of a method for training a super network or an apparatus for training a super network of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. For example, a user may send a deep learning task request related to tasks such as voice interaction, text classification, image recognition, keypoint detection, etc. with the server 105 via the network 104 via the terminal devices 101, 102, 103. Various communication client applications, such as an image processing type application, an information analysis type application, a voice assistant type application, a shopping type application, a financial type application, and the like, may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablets, car-mounted computers, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present application is not particularly limited herein.

The server 105 may be a server running various services, such as a server running an image data-based object tracking service or a voice data-based voice processing service. The server 105 may acquire or determine a neural network for implementing the various services described above. The server 105 may previously acquire deep learning task data to construct training samples, and train a neural network for implementing the above-described various services using the training samples. Upon receiving the task request, the server 105 may implement automatic searching and optimization of the model structure of the neural network. Specifically, the server 105 may implement an automatic search of the model structure of the neural network through the super network.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present application is not particularly limited herein.

It should be noted that the method for training a super network provided by the embodiments of the present disclosure is generally performed by the server 105. Accordingly, the means for training the super network is typically provided in the server 105.

In some scenarios, server 105 may obtain source data (e.g., training samples, super networks to be trained) needed for super network training from a database, memory, or other device, at which point exemplary system architecture 100 may be absent of terminal devices 101, 102, 103, and network 104.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a super network in accordance with the present application is shown. The method for training the super network of the embodiment comprises the following steps:

step 201, determining a plurality of sub-network sets according to a pre-established search space of the super-network.

In this embodiment, an execution subject (e.g., the server 105 shown in fig. 1) of the method for training the super network may previously establish a super network. A super-network refers to a network that contains all search spaces, which may include multiple layers, each layer including multiple selectable substructures. Each sub-structure may include a variety of optional operators, e.g., convolution, pooling, etc. Each operator includes some super-parameters and weight parameters. The above-mentioned super-parameters may include, for example, the size of the convolution kernel, the convolution step size, etc. The execution body may sample the search space multiple times to generate multiple sub-networks. Specifically, the execution body may sample at least one sub-structure from each layer of the search space, and increase an output end for adding output results of the at least one sub-structure, to obtain each sub-network. In sampling, a variety of sampling strategies may be employed, such as random sampling, or Bernoulli sampling, and the like.

After obtaining the plurality of sub-networks, the execution body may divide the plurality of sub-networks to obtain a plurality of sub-network sets. Wherein the sub-networks included in each sub-network set satisfy an orthogonal relationship. Here, the sub-network satisfying the orthogonal relationship refers to a sub-network in which a sub-structure included between the sub-networks is not repeated. For example, the search space of the super network includes multiple layers, A, B, C, D layers respectively. The layer A comprises substructures A1, A2, A3 and A4, the layer B comprises substructures B1, B2, B3 and B4, the layer C comprises substructures C1, C2, C3 and C4, and the layer D comprises substructures D1, D2, D3 and D4. The sub-network 1 comprises sub-structures A1, B1, C1, D1, the sub-network 2 comprises A2, B2, C2, D2, the sub-network 3 comprises sub-structures A1, B3, C3, D3, and the sub-network 4 comprises sub-structures A4, B2, C3, D4. Then there is no duplicate sub-structure between sub-network 1 and sub-networks 2, 4 and thus sub-network 1 is mutually orthogonal to sub-networks 2, 4. The sub-network 1 and the sub-network 3 have a repeated sub-structure A1, so that the sub-network 1 and the sub-network 3 are not in an orthogonal relationship. Likewise, sub-network 2 and sub-network 3 have no duplicate sub-structure, and are in an orthogonal relationship. The sub-network 2 and the sub-network 4 have a repeated sub-structure B2, which are not in an orthogonal relationship. The sub-network 3 and the sub-network 4 have a repeated sub-structure C3, which are not in an orthogonal relationship.

Step 202, based on a plurality of sub-network sets and a super-network, performing the following iterative operations for a plurality of times:

after the execution body obtains a plurality of sub-network sets, the execution body can combine with the super-network to execute iterative operation for a plurality of times. The iterative operation may include steps 2021 to 2024.

Step 2021, selecting a plurality of sub-networks from the at least one set of sub-networks.

The execution body may first select at least one sub-network set from the plurality of sub-network sets. Specifically, the plurality of sub-network sets may be used as a training pool. The at least one selected set of sub-networks may be referred to herein as a sub-training pool. The executing body may then select a plurality of sub-networks from the sub-training pool. These sub-networks may be selected multiple times or sequentially.

In some optional implementations of this embodiment, the selected multiple sub-networks may belong to the same set of sub-networks. Therefore, training the super network by utilizing the mutually orthogonal sub-networks can quickly realize training of each sub-structure in the super network, and the mutual exclusion of the parameters of the trained sub-networks can not occur. For example, if both sub-network 1 and sub-network 2 include sub-structure 1, it is possible that the parameters of sub-structure 1 are very different in the trained sub-network 1 and the trained sub-network 2. The parameters of the sub-structure 1 in the sub-network 2 may significantly reduce the performance of the sub-network 1 if applied in the sub-network 1. This is the case for parameter mutual exclusion.

And 2022, updating the plurality of sub-networks and the super-network, determining a comparison characteristic from the characteristics extracted from the updated plurality of sub-networks, and determining the comparison super-network from the updated super-network.

In this embodiment, the execution body may update a plurality of sub-networks and super-networks. Specifically, the execution body may update each sub-network with the super-network, and may update the super-network with the updated sub-network. After updating each sub-network, the execution body can determine the extracted characteristics of each sub-network, and then select one characteristic from the characteristics as a comparison characteristic. In some specific implementations, the execution body may update the first sub-network in the selection, and take the features extracted by the updated sub-network as the comparison features. The execution body may also determine a comparison supernetwork from the updated supernetworks. Here, the execution body may update the super network a plurality of times, and each update may obtain an updated super network. The execution body may use the first updated supernetwork as a comparison supernetwork.

In step 2023, the comparison super-network is updated based on the extracted features of the plurality of sub-networks and the comparison features.

After determining the comparison characteristics, the execution main body can update the comparison super network by combining the characteristics extracted by each sub-network. Specifically, the execution subject may compare the extracted features of the sub-network with the compared features, and calculate the distance between the two. And selecting a sub-network corresponding to the characteristics for updating the contrast super-network according to the distance. In updating the comparative super network, the parameters of the super network may be updated by sharing the parameters of the sub network.

Step 2024 updates the super-network according to the updated comparative super-network.

After the execution body updates the comparison super network, the execution body can update the super network according to the updated comparison super network. Specifically, the updated contrast super network can be shared, so that the super network can be updated.

According to the method for training the super network, which is provided by the embodiment of the application, the super network is updated by utilizing the mutually orthogonal sub-networks, so that the training process is ensured to rapidly cover all the sub-structures of the super network, and the training efficiency is improved. In addition, due to orthogonality of the subnetworks, mutual exclusion of the parameters of the subnetworks does not exist in the training process of the subnetworks, and therefore performance of the subnetworks can be improved. And the characteristic of each sub-network is compared with the determined comparison characteristic in the training process, the super-network updated based on each sub-network is compared with the determined comparison super-network, and the super-network parameters with better performance are always used for updating the super-network parameters. Compared with the existing super network training method, the performance of all sub-networks of the super network does not need to be always evaluated in the training process, so that the hardware for executing the method reduces the calculated amount in the super network training process and improves the processing speed of the hardware. In addition, as the method of the embodiment always uses the super network with better performance to update the parameters of the super network, each iteration can improve the performance of the super network, so that the performance is more excellent, namely, the consistency of the training of the super network and the training of the independent sub network is maintained. Moreover, due to the excellent performance of the super network, when the super network is applied to a specific field (e.g., an image processing field), a sub-network having excellent performance can be rapidly extracted through the NAS.

With continued reference to fig. 3, a flow 300 of another embodiment of a method for training a super network in accordance with the present application is shown. As shown in fig. 3, the iterative operation in the method for training a super network of the present embodiment may include the steps of:

step 301, selecting a plurality of sub-networks from at least one sub-network set.

In this embodiment, the executing body may select a plurality of sub-networks by selecting one sub-network at a time. Specifically, the execution body may also use a counter to count the number of the selected subnetworks.

Step 302, taking the first selected sub-network as a first target sub-network.

The execution body may use the first selected sub-network as the first target sub-network. Specifically, the executing body may determine whether the value of the counter is an initial value, and if so, indicate that the current selection is first selection, and may use the sub-network selected this time as the first target sub-network.

Step 303, training the first target sub-network, and taking the features extracted by the trained first target sub-network as the comparison features.

The execution body may train the first target subnetwork. During training, the executing body may train the first target sub-network with training data of the deep learning task corresponding to the first target sub-network to update weights and biases in the first target sub-network. In some specific implementations, the executing body may train the first target subnetwork using a reflection propagation (BP) algorithm. The execution subject may take the features extracted by the trained first target subnetwork as contrast features. Here, the feature extracted by the first target subnetwork after training may be the feature extracted by all the substructures of the first target subnetwork, i.e. the feature obtained before the output layer.

And step 304, updating the super network according to the trained first sub-network, and taking the updated super network as a comparison super network.

The executing body may update the super network with the trained first sub-network. For example, the trained first sub-network is shared with the super-network to obtain an updated super-network. The execution body may take the updated supernetwork as a comparison supernetwork.

In some alternative implementations of the present embodiment, the step 304 may be specifically performed by: determining a gradient of the trained first target subnetwork; and updating the super network according to the gradient, and taking the updated super network as a comparison super network.

In this implementation, the execution body may calculate the gradient of the first target subnetwork. The goal of neural network training is to obtain minimized loss functions and model parameter values. And along the gradient direction, the reciprocal direction takes the maximum value, and the calculation speed is the fastest. The calculation process of the gradient descent method is to solve the minimum value along the gradient descent direction. After calculating the gradient of the trained first target sub-network, the execution subject can update the super-network according to the gradient, namely, share the gradient to the super-network, and then take the updated super-network as a comparison super-network.

And 305, updating the sub-network which is not selected for the first time by utilizing the comparison super-network, and training each updated sub-network.

The execution body may also update the non-first selected sub-network with the comparison super-network. Specifically, the execution body may share the parameters of the comparison super network to the sub-network that is not first selected. Each subnetwork is then trained.

And 306, updating the comparison super network according to the characteristics and the comparison characteristics extracted by each trained sub network.

The execution body may determine the extracted features of each trained sub-network. The feature here may also be the feature of the layer output preceding the output layer of each sub-network. And then, updating the comparison super network by combining the comparison characteristics.

In some alternative implementations of the present embodiment, the step 306 may be specifically performed by: for each sub-network, determining a distance between the extracted feature and the compared feature of the sub-network; if the distance is smaller than the threshold value, the parameters of the super network are updated and compared by the selected sub network.

In this implementation, each time the execution subject selects a sub-network, the distance between the feature extracted by the sub-network and the contrast feature may be calculated. If the distance is less than the preset threshold, the executing body can update the contrast super-network by using the sub-network or the sub-network selected for the first time. Here, if the distance is smaller than the preset threshold, it indicates that the performance of the two sub-networks is almost the same, and then one sub-network can be arbitrarily selected to update and compare with the super-network. If the distance is greater than the preset threshold, the performance of the two sub-networks is more different. At this time, a sub-network with better performance can be selected to update the comparison super-network. In determining the performance of the subnetwork, a pre-trained performance prediction model may be employed to predict the performance of the subnetwork. The inputs to the performance prediction model may be the structure and parameters of the sub-network and the outputs may be the loss function values of the sub-network. The smaller the loss function value, the better the performance of the subnetwork.

In step 307, the super-network is updated with the comparison super-network in response to the number of updates of the comparison super-network being equal to the preset threshold.

The execution body may also count the number of updates to the comparison super network, the number of updates being the same as the number of selections of the sub-network. I.e. the executing body may use the value of the counter to determine the number of updates to the supernetwork. If the number of updates of the comparison super network is equal to the preset threshold, the comparison super network may be utilized to update the super network.

According to the method for training the super network, which is provided by the embodiment of the application, one comparison sub-network can be determined in the training process, and the sub-network with better performance is always selected to update the super network by comparing each trained sub-network with the comparison sub-network, so that the consistency of the training of the super network and the training of the independent sub-network is ensured, and the sub-network with better performance can be extracted from the super network.

With further reference to fig. 4, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for training a super network, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the apparatus 400 for training a super network of the present embodiment includes: the determining unit 401 and the iterating unit 402 include a selection module 4021, a comparison module 4022, a first updating module 4023, and a second updating module 4024.

The determining unit 401 is configured to determine a plurality of sub-network sets according to a search space of a pre-established super-network. Wherein the sub-networks in the sub-network set satisfy an orthogonal relationship.

An iteration unit 402 configured to perform an iteration operation based on the plurality of sub-network sets and the super-network, a plurality of times by:

a selection module 4021 configured to select a plurality of sub-networks from the at least one set of sub-networks.

The comparison module 4022 is configured to update the plurality of sub-networks and the super-network, determine comparison features from features extracted from the updated plurality of sub-networks, and determine a comparison super-network from the updated super-network.

The first updating module 4023 is configured to update the comparison super network based on the extracted features of the plurality of sub-networks and the comparison feature.

The second updating module 4024 is configured to update the above-described supernetwork according to the updated comparative supernetwork.

In some optional implementations of the present embodiment, the comparison module is further configured to: taking the first selected sub-network as a first target sub-network; training the first target sub-network, and taking the characteristics extracted by the trained first sub-network as comparison characteristics; and updating the super network according to the trained first sub-network, and taking the updated super network as a comparison super network.

In some optional implementations of the present embodiment, the comparison module is further configured to: training the first target subnetwork by using a reflection propagation algorithm; and determining the characteristics extracted by the first target sub-network after training, and taking the extracted characteristics as the comparison characteristics.

In some optional implementations of the present embodiment, the comparison module is further configured to: determining a gradient of the trained first target subnetwork; and updating the super network according to the gradient, and taking the updated super network as a comparison super network.

In some optional implementations of the present embodiment, the first update module is further configured to: updating the sub-network which is not selected for the first time by utilizing the comparison super-network, and training each updated sub-network; and updating the comparison super network according to the characteristics extracted by each sub network after training and the comparison characteristics.

In some optional implementations of the present embodiment, the first update module is further configured to: for each sub-network, determining the distance between the extracted characteristic of the sub-network and the contrast characteristic; and if the distance is smaller than the preset threshold value, updating the parameters of the contrast super network by using the selected sub network.

In some optional implementations of this embodiment, the second updating module is further configured to: and in response to the update times of the comparison super network being equal to a preset threshold value, updating the super network by using the comparison super network.

It should be understood that the units 401 to 405 described in the apparatus 400 for training a super network correspond to the respective steps in the method described with reference to fig. 2. Thus, the operations and features described above with respect to the method for training a super network are equally applicable to the apparatus 400 and the units contained therein, and are not described in detail herein.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 5, is a block diagram of an electronic device performing a method for training a super network according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a super network provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of the present application for training a super network.

The memory 502 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to a method for training a super network in an embodiment of the present application (e.g., the selection module 4021, the comparison module 4022, the first update module 4023, and the second update module 4024 included in the determination unit 401 and the iteration unit 402 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, i.e., implements the method for training a super network in the method embodiments described above.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from executing use of the electronic device for training the super network, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected via a network to an electronic device executing a training super network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device performing the method for training a super network may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to performing user settings and function controls for the electronic device used to train the super network, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a joystick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the super network obtained by training has higher precision, and the sub network sampled from the super network obtained by training has the same performance as the network with the same structure of independent training. Therefore, when the super network obtained by training the implementation mode is applied to the field of image processing, the sub network with excellent performance can be quickly searched out based on the super network automatic search model structure.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method for training a super network, applied in the field of image processing, the method comprising:

determining a plurality of image recognition sub-network sets according to a pre-established search space of the super-network, wherein the image recognition sub-networks in the image recognition sub-network sets meet an orthogonal relationship;

based on the plurality of image recognition sub-network sets and the super-network, performing the following iterative operations a plurality of times:

selecting a plurality of image recognition sub-networks from at least one image recognition sub-network set;

updating the plurality of image recognition sub-networks and the super-network, determining contrast characteristics from the characteristics extracted from the updated plurality of image recognition sub-networks, and determining a contrast super-network from the updated super-network;

updating the comparison super network based on the characteristics extracted by the plurality of image recognition sub networks and the comparison characteristics;

updating the super network according to the updated comparison super network;

in the image recognition task, automatically searching an image recognition sub-network with high image recognition accuracy from the super network through a network structure to serve as a neural network model for executing the image recognition task; the neural network model is used for recognizing an input image as an image recognition result.

2. The method of claim 1, wherein the updating the plurality of image recognition sub-networks and the super-network, determining a comparison feature from features extracted from the updated plurality of image recognition sub-networks, and determining a comparison super-network from the updated super-network, comprises:

taking the first selected image recognition sub-network as a first target image recognition sub-network;

training the first target image recognition sub-network, and taking the characteristics extracted by the trained first image recognition sub-network as contrast characteristics;

and updating the super network according to the trained first image recognition sub-network, and taking the updated super network as a comparison super network.

3. The method of claim 2, wherein the training the first target image recognition sub-network, taking the features extracted by the trained first image recognition sub-network as contrast features, comprises:

training the first target image recognition sub-network by using a reflection propagation algorithm;

and determining the characteristics extracted by the first target image recognition sub-network after training, and taking the extracted characteristics as the contrast characteristics.

4. A method according to claim 3, wherein said updating the super-network from the trained first image recognition sub-network, taking the updated super-network as a comparison super-network, comprises:

determining the gradient of the trained first target image recognition sub-network;

and updating the super network according to the gradient, and taking the updated super network as a comparison super network.

5. The method of claim 1, wherein the updating the contrast super-network based on the extracted features of the plurality of image recognition sub-networks and the contrast features comprises:

updating the non-first selected image recognition sub-network by using the comparison super-network, and training each updated image recognition sub-network;

and updating the comparison super network according to the characteristics extracted by each trained image recognition sub network and the comparison characteristics.

6. The method of claim 5, wherein the identifying the extracted features of the sub-network, the comparison features, and updating the comparison super-network based on the trained images comprises:

for each image recognition sub-network, determining the distance between the extracted feature of the image recognition sub-network and the contrast feature;

and if the distance is smaller than a preset threshold value, updating the parameters of the contrast super network by using the selected image recognition sub-network.

7. The method of claim 1, wherein the updating the super network from the comparative super network comprises:

and in response to the update times of the comparison super network being equal to a preset threshold value, updating the super network by using the comparison super network.

8. An apparatus for training a super network, for use in the field of image recognition, the apparatus comprising:

a determining unit configured to determine a plurality of image recognition sub-network sets according to a search space of a pre-established super-network, wherein an orthogonal relationship is satisfied between image recognition sub-networks in the image recognition sub-network sets;

an iteration unit configured to identify a set of sub-networks and the super-network based on the plurality of images, the iteration operation being performed a plurality of times by:

a selection module configured to select a plurality of image recognition sub-networks from at least one image recognition sub-network set;

the comparison module is configured to update the plurality of image recognition sub-networks and the super-network, determine comparison characteristics from the characteristics extracted from the updated plurality of image recognition sub-networks, and determine a comparison super-network from the updated super-network;

a first updating module configured to update the contrast super-network based on the features extracted by the plurality of image recognition sub-networks and the contrast features;

a second updating module configured to update the super network according to the updated comparative super network;

a search unit configured to automatically search, in an image recognition task, an image recognition sub-network having high image recognition accuracy searched out from the super network through a network structure as a neural network model for executing the image recognition task; the neural network model is used for recognizing an input image as an image recognition result.

9. The apparatus of claim 8, wherein the contrast module is further configured to:

10. The apparatus of claim 9, wherein the contrast module is further configured to:

11. The apparatus of claim 10, wherein the contrast module is further configured to:

12. The apparatus of claim 8, wherein the first update module is further configured to:

13. The apparatus of claim 12, wherein the first update module is further configured to:

14. The apparatus of claim 8, wherein the second update module is further configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.