CN115952829A - Network searching method and device, electronic equipment and storage medium - Google Patents

Network searching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115952829A
CN115952829A CN202211585844.8A CN202211585844A CN115952829A CN 115952829 A CN115952829 A CN 115952829A CN 202211585844 A CN202211585844 A CN 202211585844A CN 115952829 A CN115952829 A CN 115952829A
Authority
CN
China
Prior art keywords
network
optimized
sampling
structure code
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211585844.8A
Other languages
Chinese (zh)
Inventor
才贺
冯天鹏
张召凯
王凡祎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oppo Chongqing Intelligent Technology Co Ltd
Original Assignee
Oppo Chongqing Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo Chongqing Intelligent Technology Co Ltd filed Critical Oppo Chongqing Intelligent Technology Co Ltd
Priority to CN202211585844.8A priority Critical patent/CN115952829A/en
Publication of CN115952829A publication Critical patent/CN115952829A/en
Pending legal-status Critical Current

Links

Images

Abstract

The disclosed embodiment relates to a network searching method and device, electronic equipment and a storage medium, which relate to the technical field of machine learning, and the network searching method comprises the following steps: acquiring a plurality of sampling networks from a sample data set; coding a current sampling network in a plurality of sampling networks to obtain a current structure code, and aggregating the current structure code and a historical structure code to obtain a model mean value; obtaining an optimized structure code according to the optimized precision loss of the model mean value; and decoding the optimized structure code to obtain an optimized network structure until the optimized network structures of all sampling networks are obtained iteratively according to the optimized network structure, and determining a target network structure so as to process the object to be processed based on the target network structure. According to the technical scheme, the accuracy of network searching can be improved.

Description

Network searching method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of machine learning technologies, and in particular, to a network search method and apparatus, an electronic device, and a computer-readable storage medium.
Background
The network architecture search can select a network structure with better performance on a pre-designed search space through an effective search strategy and an evaluation method.
In the related art, neural network search may be performed based on an accuracy prediction algorithm. In the above manner, during the training process for the precision predictor, the early trained subnetwork is forgotten, which has a certain limitation and leads to low accuracy.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a network searching method and apparatus, an electronic device, and a storage medium, which overcome, at least to some extent, the problem of low accuracy caused by the limitations and disadvantages of the related art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a network search method, including: acquiring a plurality of sampling networks from the sample data set; coding a current sampling network in a plurality of sampling networks to obtain a current structure code, and aggregating the current structure code and a historical structure code to obtain a model mean value; obtaining an optimized structure code according to the optimized precision loss of the model mean value; and decoding the optimized structure code to obtain an optimized network structure, and determining a target network structure until the optimized network structures of all sampling networks are obtained based on the optimized network structure iteration so as to process the object to be processed based on the target network structure.
According to a second aspect of the present disclosure, there is provided a network search apparatus comprising: the sampling module is used for acquiring a plurality of sampling networks from the sample data set; the aggregation module is used for coding a current sampling network in a plurality of sampling networks to obtain a current structure code, and aggregating the current structure code and a historical structure code to obtain a model mean value; the decoding module is used for obtaining an optimized structure code according to the optimized precision loss of the model mean value; and the network structure determining module is used for decoding the optimized structure code to obtain an optimized network structure, and determining a target network structure until the optimized network structures of all sampling networks are obtained based on the optimized network structure iteration so as to process the object to be processed based on the target network structure.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the network search method of the first aspect described above and possible implementations thereof via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the network search method of the first aspect described above and possible implementations thereof.
According to the technical scheme provided by the embodiment of the disclosure, on one hand, the current structure code and the historical structure code of the current sampling network can be aggregated to obtain the model mean value, and then the optimized structure code is determined according to the optimized precision loss of the model mean value, so that the target network structure is obtained. The network search can be carried out by combining with the historical structure codes, and the information of all the historical structure codes is kept in the model, so that the problem of sample forgetting of the precision prediction model in the related technology is solved, the limitation that only the current sampling network is obtained is avoided, and the comprehensiveness of the information is increased. On the other hand, the current structure code and the historical structure code of the current sampling network can be aggregated, so that the information of the historical structure code is kept during training, meanwhile, the information of the optimized network structure of the current structure code is also considered in the iteration process, the training can be carried out from the information of multiple dimensions, the accuracy of the obtained optimized precision loss is improved, the accuracy of network searching is further improved, and the accuracy of the target network structure is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort.
Fig. 1 is a flowchart illustrating a network search in the related art.
Fig. 2 is a schematic diagram illustrating an application scenario to which the network search method and the network search apparatus according to the embodiment of the present disclosure may be applied.
Fig. 3 schematically illustrates a schematic diagram of a network searching method according to an embodiment of the present disclosure.
Fig. 4 schematically illustrates a schematic diagram of performing a neural network search in an embodiment of the present disclosure.
Fig. 5 schematically illustrates a flow chart of obtaining a model mean value according to an embodiment of the present disclosure.
Fig. 6 schematically illustrates a schematic diagram of an iterative acquisition optimization network structure according to an embodiment of the present disclosure.
Fig. 7 schematically illustrates an overall flow chart of a network searching method in the embodiment of the present disclosure.
Fig. 8 schematically illustrates a block diagram of a network searching apparatus in an embodiment of the present disclosure.
Fig. 9 schematically illustrates a block diagram of an electronic device in an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the related art, a network architecture searching method based on precision prediction is composed of an encoder, a decoder and a precision predictor. Referring to fig. 1, the encoder is responsible for encoding the network topology into a structure code; the decoder is responsible for encoding and decoding the network structure into the topological structure of the network; and the precision predictor predicts the corresponding real precision of the input network structure code.
In the related art, before training starts, a specified number of sub-networks are sampled randomly, and the sub-networks are trained from the beginning to obtain real precision tags of the sub-networks, which are used as an original sample data set with tags. Then, the network architecture search process based on precision prediction is started, and the specific process may include the following steps, referring to fig. 1, obtaining a sub-network from the data set by random sampling, and inputting the sub-network into the precision predictor for training. And optimizing the precision predictor to obtain the optimized precision. And decoding the optimized precision value to obtain a better network topology structure.
In the above search process, the learning process of the neural network can be regarded as a process of establishing a connection between neurons, or a process of modifying an existing connection between neurons. In the learning process, the structure of the neural network, once determined, is difficult to adjust. Furthermore, under the constraints of model capacity, the model may discard old knowledge in order to learn new samples. Because each input of the precision predictor is a sub-network structure obtained by current sampling, after a current sample is learned, the learning content of an early sample is easy to forget, and the ability of incremental learning is lacked. In the experimental process, statistics is carried out on the accuracy of the sub-networks, so that the accuracy distribution of the randomly sampled sub-networks is quite uneven, most of the network accuracy is distributed in the middle area, and few of the network accuracy is distributed in the high accuracy area and the extremely low accuracy area. This can cause a problem of severe non-uniformity of training samples of the precision predictor. The training of the precision predictor is influenced, and further, the result of the precision predictor also generates errors.
In order to solve the technical problem in the related art, the embodiments of the present disclosure provide a network search method, which may be applied to any scenario in which a network search is performed on any model. Fig. 2 is a schematic diagram illustrating a system architecture to which the network search method and apparatus according to the embodiment of the present disclosure may be applied.
As shown in fig. 2, the system architecture 200 may include a terminal 210 and a server 220. The terminal 210 may be any type of device capable of deploying a model and performing a computing function, for example, a computer, a smartphone, a smart television, a tablet computer, a smart wearable device (such as AR glasses), a robot, an unmanned aerial vehicle, and the like, as long as the network search and the model computation can be performed. The server 220 may perform a network search, and may send a target network structure obtained by the network search to the terminal, so that the terminal processes the object to be processed based on the target network structure. In addition, the terminal can also perform network search by itself to obtain a target network structure, so that the obtained target network structure is used for performing data processing on the object to be processed.
The server can encode the current sampling network through an encoder and aggregate the current sampling network with past historical structure codes to obtain a model mean value. And inputting the model mean value into the precision predictor for training and optimization to obtain more accurate optimized structure codes. And decoding to obtain an optimized network structure (an optimized current sampling network). In the iterative process, the optimized network structure is also used as an old model and added into a model base of past historical structure codes, and the optimized network structure plays a role in restraining subsequent training.
In some embodiments, when the terminal 210 is different from the server 220, the data of the object to be processed may be obtained from the terminal 210 and sent to the server, a network search is performed in the server to obtain a target network structure, and the data of the object to be processed is processed by the server to obtain a processing result. Further, the processing result may be sent to the terminal 210 for subsequent processing.
It should be noted that the network search method provided by the embodiment of the present disclosure may be executed by the terminal 210 or the server 220, which is specifically defined according to the location of network deployment and actual requirements.
Fig. 3 schematically shows a network searching method in the embodiment of the present disclosure, which specifically includes the following steps:
step S310, acquiring a plurality of sampling networks from the sample data set;
step S320, coding a current sampling network in a plurality of sampling networks to obtain a current structure code, and aggregating the current structure code and a historical structure code to obtain a model average value;
step S330, obtaining an optimized structure code according to the optimized precision loss of the model mean value;
step S340, decoding the optimized structure code to obtain an optimized network structure, determining a target network structure until the optimized network structures of all sampling networks are obtained based on the optimized network structure iteration, and processing the object to be processed based on the target network structure.
In embodiments of the present disclosure, a plurality of sampling networks may be determined from a sample data set. The current sampling network may be any one of a plurality of sampling networks. The current structure code may be a structure code obtained by performing a coding process on the current sampling network output by the encoder. The historical structure code may be a structure code obtained by performing a coding process on all sampling networks located before the current sampling network, and output by the encoder. The model mean may be obtained by performing a weighted average of the current structure code and the historical structure code.
Furthermore, the model mean value can be used as the input of the precision predictor, and the precision predictor is used for carrying out real precision prediction on the model mean value to obtain the optimized precision loss of the model mean value corresponding to the current sampling network. And determining the optimized structure code corresponding to the optimized precision loss based on the optimized precision loss. On the basis, the optimized structure code can be input into a decoder to obtain the optimized network structure corresponding to the current sampling network.
Next, the optimized network structure corresponding to the current sampling network may be added to the historical structure code to update the historical structure code. And taking the next sampling network as the current sampling network, and carrying out iterative aggregation on the current structure code and the historical structure code of the current sampling network until all sampling networks are coded and decoded to obtain the corresponding optimized network structure, thereby obtaining the target network structure. Based on this, corresponding processing operations may be performed on the object to be processed based on the target network structure.
In the above technical solution, on one hand, the current structure code and the historical structure code of the current sampling network can be aggregated to obtain a model mean value, and then the optimized structure code is determined according to the optimization precision loss of the model mean value, so as to obtain the target network structure. The network search can be carried out by combining the historical structure codes, and the information of all the historical structure codes is kept in the model, so that the problem of sample forgetting of an accuracy prediction model in the related technology is solved, the limitation that only the current sampling network is obtained is avoided, and the comprehensiveness of the information is increased. On the other hand, the current structure code and the historical structure code of the current sampling network can be aggregated, so that the information of the historical structure code is kept during training, the information of the current structure code is also considered, the training can be performed from the information of multiple dimensions, the accuracy of the obtained optimization precision loss is improved, the accuracy of network searching is further improved, and the accuracy of the target network structure is improved.
Next, referring to fig. 3, each step of the network searching method in the embodiment of the present disclosure will be described in detail.
In step S310, a plurality of sampling networks are acquired from the sample data set.
In the embodiment of the present disclosure, since the search space usually contains huge parameters and calculation amounts, and corresponding model deployment is limited in some scenes and devices, network search is required to be performed to determine a network structure with better performance to improve the model accuracy.
The sample data set refers to a sample data set with labels, wherein a plurality of subnetworks are randomly sampled in a search space, the subnetworks are trained from the beginning to obtain trained subnetworks and real precision labels corresponding to each subnetwork, and the trained subnetworks with the real precision labels are further used as the sample data set with the labels. Thus, the sample dataset may comprise a plurality of trained sub-networks with true accuracy labels.
In the experimental process, statistics on the accuracy of the sub-networks can find that the randomly sampled accuracy of the sub-networks is unevenly distributed, most of the network accuracy is distributed in the middle area, and few of the network accuracy is distributed in the high-accuracy area and the extremely-low-accuracy area. The uneven distribution may cause the problem that training samples of the precision predictor are uneven, so that the training of the precision predictor may be affected, and further, an error may be generated on the result of the precision predictor.
In order to solve the influence on the subsequent precision predictor, the precision interval of the sample data set can be divided into a plurality of intervals aiming at the problem of uneven sub-network sampling. In the sampling process, the same uniform number of samples are sampled in each interval of the sample data set, so that the same uniform number of samples are ensured to be sampled in each precision interval, the samples are ensured to be uniform, and a plurality of sampling networks are determined. The sample data set is uniformly sampled based on the equally divided precision interval, so that the uniformity of the sample is improved, the effectiveness of training data of the precision predictor can be improved, the precision predictor training is carried out on the premise of ensuring the effectiveness of the sample, and the training effect of the precision predictor and the overall precision of the model can be effectively improved.
For example, the precision interval may be divided into N intervals, and uniform sampling may be performed from each interval in the sample data set to obtain a plurality of sampling networks.
Next, in step S320, a current sampling network in the plurality of sampling networks is encoded to obtain a current structure code, and the current structure code is aggregated with the historical structure code to obtain a model mean.
In the embodiment of the disclosure, when network search is performed, search can be performed in a search space based on a search strategy, so that accurate network search is realized. The network searching mode can be architecture searching of a neural network. The architecture search of the neural network is a method for automatically searching out the optimal network architecture by training and calculating through a machine. The network architecture search can select a network structure on a pre-designed search space through a preset search strategy and an evaluation method.
The search process for neural network architecture search is shown in fig. 4. First, a specific search space is specified. The search space may be, for example, a Cell structure, a Block structure, and the like. The type of the intermediate layer is selectable except for the input layer and the output layer, and different limited selection ranges can be set for different types of layers generally. The hyper-parameters of each layer of the network can also be selected, and comprise the number of convolution kernels, the number of convolution kernel channels, the height, the width, the step length in the horizontal direction and the step length in the vertical direction and the like.
After the search space is determined, a network search may be performed according to a search strategy and a performance evaluation model. The search strategy comprises a method based on reinforcement learning, an evolutionary algorithm, differentiation and weight sharing and the like. Common performance evaluation methods include proxy model evaluation and inheritance hyper-network weight evaluation, among others. And selecting a sub-network structure in the search space through a reasonable search strategy, and finally selecting a network with optimal performance through an accurate model performance evaluation method.
In the embodiment of the present disclosure, when performing architecture search of a neural network, the method may include a method based on reinforcement learning, an evolutionary algorithm, super-network weight sharing, a differential algorithm, an accuracy prediction algorithm, and the like. The following description will be given by taking an example of performing a network search using a search method that optimizes a network structure based on precision prediction. The searching method based on the precision prediction optimization network structure can evaluate the performance of the candidate sub-networks, and can relieve the problem that the actual performance of the selected sub-structures is inconsistent with the actual performance of the selected sub-networks in the searching process. The search space may define a certain fixed combination of several modules or set the number of channels, etc., and may be determined by predefined candidate operations and hyper-parameters of the neural structure (e.g., structure template, connection method, number of convolutional layer channels for feature extraction in the initial stage, etc.).
The network architecture searching method based on precision prediction can be composed of an encoder, a decoder and a precision predictor. The encoder is responsible for encoding the network topology structure represented by the sampling network into structure encoding; the precision predictor predicts the actual precision corresponding to the input network structure code. The decoder is responsible for decoding the network structure code into the topological structure of the sampling network based on the real precision so as to obtain the corresponding optimized structure code.
The current sampling network may be a sub-network sampled at the current time and provided with a real precision tag, and may be any one of a plurality of sampling networks. Based on the structure formed by the encoder, the precision predictor and the decoder, the current sampling network can be used as the input of the encoder, and the encoder encodes the current sampling network to obtain the current structure code, so that the topological structure code represented by the current sampling network is converted into the structure code. The encoder may be used to encode all sampling networks to obtain a structural code corresponding to each sampling network. And, the corresponding structural codes of different sampling networks can be the same or different.
The historical structure code refers to the structure code of all sampling networks located before the current sampling network. The network structure in the historical structure code may be the same as or different from the network structure of the current structure code, and is not limited herein. The history structure code may include the same or different network structures.
In the embodiment of the disclosure, the current structure code and the historical structure code of the current sampling network can be aggregated to obtain a model mean value. Aggregation therein may be understood as model fusion. The model mean may contain model information of all models, not just information of the current sampling network. The model mean may be an average model. In some embodiments, the model mean may be obtained by performing weighted average on structures of the same type in the current structure code and the historical structure code according to dimensions. That is, the same type of structure may be weighted averaged by dimension. The dimension used for weighted averaging may be related to the search task, and may specifically be determined according to the task type of the search task, and the task types of the search tasks are different, and the corresponding dimensions used for weighted averaging are also different. The dimensions may include one or more parameters that may be used to represent the model in general, and are determined based on the task type of the search task in particular.
Based on this, the specific step of aggregating the current structure code and the historical structure code of the current sampling network to obtain a model average value may include: determining a task type of a search task; and determining a corresponding weight value according to the task type, and performing weighted average on the structures of the same type according to corresponding dimensionalities based on the weight value to determine the model mean value. The task type of the search task may include a pruning search task or a broad search task, among others. When the search task is a pruning search task, the dimension may be a dimension corresponding to the pruning search task, for example, the number of channels; when the search task is other broad search tasks, the dimension may be any one of a filter size, cov size, and pooling size, or a combination thereof, for example.
After determining the task type of the search task, a weight value corresponding to the task type may be determined. The weighted values corresponding to different task types can be different, and can be actually adjusted according to models corresponding to different task types, and the weighted values of the same task type can also be adjusted according to actual requirements, and the size is not particularly limited here. For any one of the structures with the same type in the current structure code and the historical structure code, the weight values of the structures can be the same or different, and are specifically limited according to actual requirements. After the weight values are determined, the dimensions of the same type of structures may be weighted and averaged according to the weight values to obtain a model mean. For example, the same dimensions of the same type of structures are weighted and averaged according to the weight values to obtain an average model represented by a model mean.
For example, when the task type of the search task is a pruning search task, a weight value corresponding to the task type, for example, a parameter 1, may be determined. And further averaging the channel numbers of the structures of the same type in the current structure code and the historical structure code according to the weight values so as to perform weighted average on the dimensions of all the structures in the current structure code and the historical structure code to obtain a model average value. For another example, for other broad search tasks, a weight value corresponding to the broad search task may be determined, for example, a parameter 2, and a filter size, a cov size, a pooling size, and the like of the same type of structure may be weighted and averaged according to the parameter 2 to obtain a corresponding model mean value.
Referring to fig. 5, a weight value may be determined, and based on the determined weight value, the dimensions of the same type of structure in the current structure code 501 and the historical architecture code 502 corresponding to the current sampling network are weighted and averaged to obtain a model mean value 503.
In the embodiment of the disclosure, each time a certain number of sampling networks are sampled, the current structure code corresponding to the current sampling network and all historical structure codes located in front of the current sampling network are aggregated, the mean value of the distribution of the sub-networks is determined, and the historical structure codes and the current structure codes are used as training data together. By aggregating the historical network codes and the current network codes, the sample information represented by the historical network codes is kept in the model, the problem of sample forgetting of a precision prediction model in the related technology is solved, the limitation that prediction is only carried out according to the current sampling network is avoided, and the comprehensiveness and the accuracy are improved.
Next, with continued reference to fig. 3, in step S330, an optimized structure code is obtained according to the optimized precision loss of the model mean.
In the embodiment of the present disclosure, the model mean may be input to the precision predictor, so that the precision predictor predicts the actual precision loss of the model mean. It should be noted that there may be an initial accuracy loss in the model mean, and in the accuracy predictor, gradient optimization is performed based on the initial accuracy loss in the model mean, so as to determine the optimized accuracy loss. The gradient optimization can be performed by using a gradient descent method. The optimization precision loss is different from the initial precision loss, and the accuracy of the optimization precision loss is higher and is closer to the real precision loss.
Illustratively, the step of obtaining the optimized accuracy loss using the gradient descent method may include: based on the initial loss of precision, a preset step size is moved along a gradient descending direction of the precision predictor to convert the initial loss of precision into an optimized loss of precision. Because the preset step length is small enough, and the moving direction of the preset step length is consistent with the gradient descending direction, the result obtained by the precision predictor in the current step can be ensured to be the optimized result, namely the optimization precision loss. For example, the initial precision loss e1 may be optimized by moving a preset step along the gradient descending direction of the precision predictor until the optimized precision loss e2 is obtained. Through the gradient descending mode, the accuracy of optimizing precision loss can be improved.
After the optimized precision loss is obtained, a corresponding optimized structure code can be obtained based on the optimized precision loss. The optimized structure codes corresponding to different optimized precision losses may be different, and the optimized structure codes can be obtained by training based on the optimized precision losses. Illustratively, the model mean may be trained based on the optimization precision loss to update the model parameters thereof, resulting in an optimized structure code corresponding to the optimization precision loss. Since the optimized structure code can be obtained based on the optimized precision loss, the accuracy of the optimized structure code can be improved.
With continued reference to fig. 3, in step S340, the optimized structure code is decoded to obtain an optimized network structure, until the optimized network structures of all sampling networks are obtained iteratively according to the optimized network structure, and a target network structure is determined, so as to perform a processing operation on an object to be processed based on the target network structure.
In the embodiment of the present disclosure, after the optimized structure code is obtained, the optimized structure code may be decoded by a decoder, so as to obtain an optimized network structure corresponding to the current sampling network. For example, the current sampling network may be encoded to obtain a current structure code a, and further, an optimized precision loss is obtained through a precision predictor, so as to obtain an optimized structure code corresponding to the optimized precision loss, where the optimized structure code may be obtained based on an optimized precision loss training. After the optimized structure code is obtained, the optimized structure code can be decoded through a decoder to obtain an optimized network structure A' corresponding to the current sampling network.
Fig. 6 schematically shows a flow chart of iteratively obtaining an optimized network structure, which, with reference to fig. 5, mainly includes the following steps:
in step S610, determining a historical structure code according to an optimized network structure corresponding to the current sampling network;
in step S620, iterative search is performed on the next sampling network and the historical structure code to obtain an optimized network structure of the next sampling network, until the optimized network structure is determined for all sampling networks in the sample data set, so as to obtain the target network structure.
In the embodiment of the present disclosure, after the optimized network structure corresponding to the current sampling network is obtained, the optimized network structure corresponding to the current sampling network may be used as the historical structure code again, and the previous historical structure code is updated to update and determine the historical structure code. Further, iterative training can be performed on the next sampling network and the historical structure codes to obtain an optimized network structure of the next sampling network until the optimized network structure is determined for all sampling networks in the sample data set based on the optimized network structure, so as to obtain a target network structure. The target network structure may be a portion of a search space that refers to the optimal network structure searched in the search space.
The sampling network may be a sampling network obtained by sampling after the current sampling network. For the next sampling network, the next sampling network can be used as the current sampling network again, the next sampling network and the historical structure code of the optimized network structure containing the current sampling network are encoded, and the structure code of the next sampling network and the structure of the same type in the historical structure code are weighted and averaged according to the dimensionality based on the weight value corresponding to the task type to realize aggregation, so that the model mean value corresponding to the next sampling network is obtained, wherein the model mean value is different from the model mean value corresponding to the current sampling network. Next, an optimized precision loss of the model mean may be determined based on the precision predictor, and the model mean may be trained based on the optimized precision loss to obtain an optimized structure code corresponding to the next sampling network. Further, the optimized structure code may be decoded to obtain an optimized network structure corresponding to the next sampling network.
On the basis, the optimized network structure corresponding to the next sampling network can be reintegrated into the historical structure code so as to update the historical structure code. Further, the next sampling network of the next sampling network may be used as the current sampling network, and the current sampling network and the historical structure code are aggregated until the optimized network structure is obtained. It should be noted that, for each sampling network in the sample data set, aggregation may be performed with the historical structure code to obtain a model mean value corresponding to each sampling network, and then an optimized network structure corresponding to the model mean value is obtained according to the optimization precision loss of the model mean value until each sampling network in the sample data set obtains the optimized network, so as to determine the target network structure.
In the embodiment of the disclosure, the structure codes of each sampling network and the historical structure codes can be aggregated to obtain a model mean value representing a mean value of sub-network distribution, so that the historical structure codes and the current structure codes are used as training data together, and in an iterative process, the current sampling network is also used as a model and is added into a model base of the historical structure codes, so that a constraint effect is exerted on subsequent training. The limitation that network searching can be carried out only according to one sampling network is avoided, network searching can be carried out according to the current sampling network and the historical structure code, and comprehensiveness and accuracy are improved.
On the basis, after the target network structure is obtained, the object to be processed can be processed based on the target network structure. The object to be processed may be an image to be processed, a voice to be processed, a text to be processed, and the like, and the object to be processed may be specifically determined according to the type of the processing operation and the actual application scenario. Based on the method, the optimal target network structure obtained through network search can be used for processing the object to be processed, and the function corresponding to the processing operation is realized. The processing operation may be an operation in a target task scene, and is specifically determined according to the type of the target task. The target task can be various types of tasks, such as a classification task, a detection task and a segmentation task, an identification task, and the like, and can be determined according to an actual application scenario and actual requirements. Based on this, the processing operation may include, but is not limited to, a classification operation, a detection operation, a segmentation operation, and an identification operation, so that various types of operations may be implemented on the object to be processed based on the target network structure, which is not limited in detail herein.
Fig. 7 schematically shows a flow diagram of network search, and referring to fig. 7, the precision prediction based network architecture search method is composed of an encoder, a decoder and a precision predictor. The network search mainly comprises the following steps:
in step S701, a sample data set after precision equalization is acquired.
Wherein the sample data set may be a model data set. Before training begins, a specified number of sub-networks are sampled randomly in a search space, and the sub-networks are trained from the beginning to obtain real precision labels of the sub-networks, wherein the real precision labels are used as a sample data set with the labels. Further, for the problem of uneven sampling of the sub-network, the precision interval may be equally divided into a plurality of parts, for example, N precision intervals, and in the sampling process, the same uniform number of samples are sampled for the sample data set of each precision interval, so as to ensure that the samples are uniform and ensure the validity of the training data of the precision predictor. The precision predictor training is carried out on the premise of ensuring the sample to be effective, and the training effect of the precision predictor and the overall precision of the model can be effectively improved.
In step S702, the current sampling network obtained by uniform sampling is input to the encoder, so as to obtain the current structure code.
The encoder is responsible for encoding the network topology into a structure code. In other words, the encoder encodes the network topology represented by the current sampling network, and obtains a current structure code capable of performing feature processing. For example, the current sampling network is encoded to obtain a current structure code a.
In step S703, the current structure code and the historical structure code are aggregated to obtain a model mean.
In this step, during aggregation, weighted average may be taken for the dimensions according to the structures of the same type in the current structure code and the historical structure code, that is, weighted average may be taken according to the dimensions of the set search object. The weight value for weighted average can be adjusted in real time according to different specific models of tasks. For example, the pruning search task mainly includes averaging the number of channels; other extensive searches may be weighted averaging of filter size (filter size), convolution size (cov size), pooling size (posing size), etc., to obtain a model mean.
In step S704, the model mean is input to the precision predictor for training and optimization, and an optimized structure code is obtained.
In this step, the precision predictor predicts the corresponding real precision of the input network structure code. For example, a gradient descent method may be used to obtain an optimized precision loss, and the actual precision loss of the model mean may be predicted by the precision predictor based on the initial precision loss e1 to obtain an optimized precision loss e2. And moreover, model parameters of the model mean value can be adjusted based on the optimization precision loss so as to update and train the model mean value and obtain the corresponding optimized structure code. Each sampling network and the historical structure code thereof can be trained to generate an optimized structure code, and the structure of the optimized structure code is different from that of the historical structure code aggregated by each sampling network. Because the current sampling network and the historical structure codes are aggregated into a model mean value, the model mean value can be directly trained and adjusted subsequently so as to keep the information of the historical structure codes and improve the accuracy.
In step S705, the decoder decodes the optimized structure code to obtain an optimized network structure. The decoder is responsible for decoding the network structure code into the topology of the network.
In step S706, the optimized network structure is integrated into the historical structure code to determine the historical structure code. And returning to the step S702 to continue the execution until the optimized network structure is determined for all the sampling networks, so as to obtain the target network structure obtained by the network search.
For example, the original historical structure code is updated according to the optimized network structure a' corresponding to the current sampling network, so as to determine the historical structure code.
In this step, the steps described in step S702 to step S706 are performed on the next sampling network. Specifically, the next sampling network structure code is aggregated with the historical structure code of the optimized network structure fused with the current sampling network to obtain a model mean value; training is further carried out according to the optimization precision loss of the next sampling network, and the optimization structure code of the next sampling network is obtained; and decoding the optimized structure code of the next sampling network to obtain the optimized network structure of the next sampling network. Further, the optimized network structures of the next sampling network may be fused into the historical structure code to update the historical structure code, and the steps from step S702 to step S706 are performed until the optimized network structures of all sampling networks are obtained, so as to obtain the target network structure.
In the embodiment of the disclosure, in the iteration process, the current network structure is also used as an old model and added to the historical structure code, the current structure code and the historical structure code are aggregated to obtain the model mean value, and the information of the old sample is retained in the model, so that the model mean value is trained, the subsequent training is restrained, and the problem of forgetting the sample of the model is alleviated to a certain extent. The limitation that only the current network structure is considered is avoided, and the comprehensiveness and the accuracy are improved.
In addition, the uniform sampling of the samples of the sample data set is realized through the division of the precision interval, so that the uniformity of the training samples of the precision prediction model is ensured, and the effectiveness of the training data of the precision predictor is ensured. The precision predictor training is carried out on the premise of ensuring the sample to be effective, and the training effect of the precision predictor and the overall precision of the model can be effectively improved. The training effect of the precision prediction model is improved, the accuracy of network search is improved from two dimensions of a sample and a processing mode, and the network search effect is improved.
In an embodiment of the present disclosure, a network searching apparatus is provided, and referring to fig. 8, the network searching apparatus 800 may include:
a sampling module 801, configured to obtain a plurality of sampling networks from the sample data set;
an aggregation module 802, configured to encode a current sampling network in the multiple sampling networks to obtain a current structure code, and aggregate the current structure code and a historical structure code to obtain a model average;
a decoding module 803, configured to obtain an optimized structure code according to the optimized precision loss of the model mean;
a network structure determining module 804, configured to decode the optimized structure code to obtain an optimized network structure, and determine a target network structure until the optimized network structures of all sampling networks are obtained based on the optimized network structure iteration, so as to perform a processing operation on an object to be processed based on the target network structure.
In an exemplary embodiment of the present disclosure, the sampling module includes: the data set acquisition module is used for acquiring a sample data set; and the precision division module is used for dividing the precision interval of the sample data set into a plurality of intervals and sampling the sample data set in each interval by the same uniform quantity so as to determine a plurality of sampling networks.
In an exemplary embodiment of the disclosure, the data set acquisition module is configured to: randomly sampling a plurality of sub-networks in a search space, and training the sub-networks to obtain a sample data set with real precision labels.
In an exemplary embodiment of the present disclosure, the aggregation module includes: and the weighted average module is used for carrying out weighted average on the structures of the same type in the current structure code and the historical structure code according to dimensionality to obtain the model average value.
In an exemplary embodiment of the present disclosure, the weighted average module includes: the type determining module is used for determining the task type of the search task; and the model mean value determining module is used for determining a corresponding weight value according to the task type and carrying out weighted average on the structures of the same type according to the dimension based on the weight value so as to determine the model mean value.
In an exemplary embodiment of the present disclosure, the decoding module includes: the loss prediction module is used for carrying out gradient optimization on the initial precision loss of the precision predictor and determining the optimized precision loss; and the training module is used for carrying out model training based on the optimization precision loss to obtain the optimization structure code.
In an exemplary embodiment of the present disclosure, the network structure determining module includes: the historical structure code updating module is used for determining a historical structure code according to the optimized network structure corresponding to the current sampling network; and the iterative search module is used for performing iterative search on the next sampling network and the historical structure codes to obtain the optimized network structure of the next sampling network until the optimized network structure is determined for all sampling networks in the sample data set so as to obtain the target network structure.
In an exemplary embodiment of the present disclosure, the iterative search module includes: the iterative aggregation module is used for aggregating the next sampling network and the historical structure codes to obtain a model mean value of the next sampling network; the training module is used for updating the optimized structure code according to the optimized precision loss of the model mean value of the next sampling network; and the model decoding module is used for decoding the optimized structure code to obtain the optimized network structure of the next sampling network.
It should be noted that, the specific details of each part in the network search apparatus have been described in detail in some embodiments of the network search method, and details that are not disclosed may refer to the embodiments of the method part, and thus are not described again.
Exemplary embodiments of the present disclosure also provide an electronic device. The electronic device may be the terminal 210 described above. In general, the electronic device may include a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the above-described method via execution of the executable instructions.
The following takes the mobile terminal 900 in fig. 9 as an example, and the configuration of the electronic device is exemplarily described. It will be appreciated by those skilled in the art that the configuration of figure 9 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.
As shown in fig. 9, the mobile terminal 900 may specifically include: the mobile communication terminal comprises a processor 901, a memory 902, a bus 903, a mobile communication module 904, an antenna 1, a wireless communication module 905, an antenna 2, a display screen 906, a camera module 907, an audio module 908, a power supply module 909 and a sensor module 910.
Processor 901 may include one or more processing units, such as: the Processor 901 may include an AP (Application Processor), a modem Processor, a GPU (Graphics Processing Unit), an ISP (Image Signal Processor), a controller, an encoder, a decoder, a DSP (Digital Signal Processor), a baseband Processor, and/or an NPU (Neural-Network Processing Unit), etc. The method in this exemplary embodiment may be performed by the AP, GPU or DSP, and may be performed by the NPU when the method involves neural network related processing, e.g., the NPU may load neural network parameters and execute neural network related algorithm instructions.
An encoder may encode (i.e., compress) an image or video to reduce the data size for storage or transmission. The decoder may decode (i.e., decompress) the encoded data for the image or video to recover the image or video data. The mobile terminal 900 may support one or more encoders and decoders, such as: image formats such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group) 1, MPEG10, h.1063, h.1064, and HEVC (High Efficiency Video Coding).
The processor 901 may be connected to the memory 902 or other components via the bus 903.
The memory 902 may be used to store computer-executable program code, which includes instructions. The processor 901 performs various functional applications and data processing of the mobile terminal 900 by executing instructions stored in the memory 902. The memory 902 may also store application data, such as files for storing images, videos, and the like.
The communication function of the mobile terminal 900 may be implemented by the mobile communication module 904, the antenna 1, the wireless communication module 905, the antenna 2, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 904 may provide a mobile communication solution of 3G, 4G, 5G, etc. applied to the mobile terminal 900. The wireless communication module 905 may provide wireless communication solutions for wireless local area network, bluetooth, near field communication, etc. applied to the mobile terminal 900.
The display screen 906 is used to implement display functions, such as displaying a user interface, images, videos, and the like. The camera module 907 is used for implementing shooting functions, such as shooting images, videos, and the like, and may include a color temperature sensor array therein. The audio module 908 is used for implementing audio functions, such as playing audio, capturing voice, and the like. The power module 909 is used to implement power management functions such as charging batteries, powering devices, monitoring battery status, etc. The sensor module 910 may include one or more sensors for implementing corresponding sensing functions. For example, the sensor module 910 may include an inertial sensor for detecting a motion pose of the mobile terminal 900 and outputting inertial sensing data.
It should be noted that, in the embodiments of the present disclosure, a computer-readable storage medium is also provided, and the computer-readable storage medium may be included in the electronic device described in the foregoing embodiments; or may exist separately without being assembled into the electronic device.
A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The computer readable storage medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the method as described in the embodiments below.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed, for example, synchronously or asynchronously in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims. It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (11)

1. A method of searching a network, comprising:
acquiring a plurality of sampling networks from the sample data set;
coding a current sampling network in a plurality of sampling networks to obtain a current structure code, and aggregating the current structure code and a historical structure code to obtain a model mean value;
obtaining an optimized structure code according to the optimized precision loss of the model mean value;
and decoding the optimized structure code to obtain an optimized network structure until the optimized network structures of all sampling networks are obtained iteratively according to the optimized network structure, and determining a target network structure so as to process the object to be processed based on the target network structure.
2. The method of claim 1, wherein the obtaining a sample network from a sample data set comprises:
acquiring a sample data set;
and dividing the precision interval of the sample data set into a plurality of intervals, and sampling the sample data set in each interval by the same uniform number to determine a plurality of sampling networks.
3. The method according to claim 2, wherein said obtaining the sample data set comprises:
randomly sampling a plurality of sub-networks in a search space, and training the sub-networks to obtain a sample data set with a real precision label.
4. The method for searching a network according to claim 1, wherein the aggregating the current structure code and the historical structure code to obtain a model mean value comprises:
and carrying out weighted average on the structures of the same type in the current structure code and the historical structure code according to dimensionality to obtain the model mean value.
5. The network searching method according to claim 4, wherein the obtaining the model mean value by performing weighted average on the structures of the same type in the current structure code and the historical structure code according to dimensions comprises:
determining a task type of a search task;
and determining a corresponding weight value according to the task type, and performing weighted average on the structures of the same type according to dimensionalities based on the weight value to determine the model mean value.
6. The network searching method of claim 1, wherein the obtaining an optimized structure code according to the optimized precision loss of the model mean comprises:
carrying out gradient optimization on the initial precision loss of the precision predictor, and determining the optimized precision loss;
and carrying out model training based on the optimization precision loss to obtain the optimization structure code.
7. The method according to claim 1, wherein the decoding the optimized structure code to obtain an optimized network structure until a target network structure is determined based on iteratively obtaining the optimized network structures of all sampling networks according to the optimized network structure comprises:
determining a historical structure code according to an optimized network structure corresponding to the current sampling network;
and carrying out iterative search on the next sampling network and the historical structure codes to obtain an optimized network structure of the next sampling network until all sampling networks in the sample data set are determined to be the optimized network structure so as to obtain the target network structure.
8. The method according to claim 7, wherein the iteratively searching the next sampling network and the historical structure code to obtain the optimized network structure of the next sampling network comprises:
aggregating the next sampling network and the historical structure codes to obtain a model mean value of the next sampling network;
updating an optimized structure code according to the optimized precision loss of the model mean value of the next sampling network;
and decoding the optimized structure code to obtain the optimized network structure of the next sampling network.
9. A network search apparatus, comprising:
the sampling module is used for acquiring a plurality of sampling networks from the sample data set;
the aggregation module is used for coding a current sampling network in a plurality of sampling networks to obtain a current structure code, and aggregating the current structure code and a historical structure code to obtain a model mean value;
the decoding module is used for obtaining an optimized structure code according to the optimized precision loss of the model mean value;
and the network structure determining module is used for decoding the optimized structure codes to obtain optimized network structures, determining a target network structure until the optimized network structures of all sampling networks are obtained in an iteration mode, and processing the objects to be processed based on the target network structure.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the network search method of any one of claims 1-8 via execution of the executable instructions.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the network search method of any one of claims 1 to 8.
CN202211585844.8A 2022-12-09 2022-12-09 Network searching method and device, electronic equipment and storage medium Pending CN115952829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211585844.8A CN115952829A (en) 2022-12-09 2022-12-09 Network searching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211585844.8A CN115952829A (en) 2022-12-09 2022-12-09 Network searching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115952829A true CN115952829A (en) 2023-04-11

Family

ID=87281660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211585844.8A Pending CN115952829A (en) 2022-12-09 2022-12-09 Network searching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115952829A (en)

Similar Documents

Publication Publication Date Title
CN108882020B (en) Video information processing method, device and system
KR20190119548A (en) Method and apparatus for processing image noise
CN112270710B (en) Pose determining method, pose determining device, storage medium and electronic equipment
CN113436620B (en) Training method of voice recognition model, voice recognition method, device, medium and equipment
CN113095370B (en) Image recognition method, device, electronic equipment and storage medium
CN113327599B (en) Voice recognition method, device, medium and electronic equipment
CN112257855B (en) Neural network training method and device, electronic equipment and storage medium
CN112381828A (en) Positioning method, device, medium and equipment based on semantic and depth information
CN116978011A (en) Image semantic communication method and system for intelligent target recognition
CN112399177B (en) Video coding method, device, computer equipment and storage medium
CN114494942A (en) Video classification method and device, storage medium and electronic equipment
CN112163473A (en) Multi-target tracking method and device, electronic equipment and computer storage medium
CN116090543A (en) Model compression method and device, computer readable medium and electronic equipment
CN115952829A (en) Network searching method and device, electronic equipment and storage medium
CN115906986A (en) Network searching method and device, electronic equipment and storage medium
CN116644783A (en) Model training method, object processing method and device, electronic equipment and medium
CN114330239A (en) Text processing method and device, storage medium and electronic equipment
CN114900435B (en) Connection relation prediction method and related equipment
CN115830342A (en) Method and device for determining detection frame, storage medium and electronic device
CN112070211B (en) Image recognition method based on computing unloading mechanism
CN114399648A (en) Behavior recognition method and apparatus, storage medium, and electronic device
KR20230075678A (en) Method and apparatus for predicting price change based on object recognition
EP3683733A1 (en) A method, an apparatus and a computer program product for neural networks
CN115496175A (en) Newly-built edge node access evaluation method and device, terminal equipment and product
CN114648712A (en) Video classification method and device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination