CN111260035A - Method, system, equipment and medium for searching structural parameters of EfficientNet - Google Patents

Method, system, equipment and medium for searching structural parameters of EfficientNet Download PDF

Info

Publication number
CN111260035A
CN111260035A CN202010057657.7A CN202010057657A CN111260035A CN 111260035 A CN111260035 A CN 111260035A CN 202010057657 A CN202010057657 A CN 202010057657A CN 111260035 A CN111260035 A CN 111260035A
Authority
CN
China
Prior art keywords
efficientnet
structural parameters
training
preset number
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010057657.7A
Other languages
Chinese (zh)
Inventor
于彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010057657.7A priority Critical patent/CN111260035A/en
Publication of CN111260035A publication Critical patent/CN111260035A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a searching method of structural parameters of EfficientNet, which comprises the following steps: randomly initializing structural parameters in a search space in a preset range; generating EfficientNet by using the randomly initialized structural parameters; predicting the final precision of the EfficientNet after the epoch training of the first preset number; in response to the fact that the predicted final precision is smaller than the threshold value, predicting the final precision of the EfficientNet after a second preset number of epoch training again, wherein the second preset number is larger than the first preset number; and stopping training in response to the final precision of the prediction or the prediction again being smaller than the threshold value, reserving and recording the corresponding structural parameters, and returning to the step of randomly initializing the structural parameters. The invention also discloses a system, a computer device and a readable storage medium. The invention replaces the original grid searching method with random search, and does not need to train all the neural networks determined by the structural parameter combination to be convergent by a method of predicting the final precision in advance, thereby greatly reducing the consumption of computing resources.

Description

Method, system, equipment and medium for searching structural parameters of EfficientNet
Technical Field
The invention relates to the field of neural networks, in particular to a method, a system, equipment and a storage medium for searching structural parameters of EfficientNet.
Background
EfficientNets are networks proposed on ICML19 in 5.2019, the general idea of which is similar to the CNN development idea mentioned above, and is also a scaling model, and EfficientNets propose a more systematic approach: first, a complex model scaling method is employed. The depth, the width and the image resolution are not independently adjusted, but the dependency relationship among the three parameters is increased, and the configuration among the three parameters is balanced, so that the network precision is improved to the maximum extent under the current computing resources. By adjusting the three parameters, a series of networks with different precisions from EfficientNet-b0 to b7 are formed. The feasibility of the method was verified on MobileNet and ResNet. Second, the infrastructure of EfficientNet was searched using the AutoML MNAS framework, with an architecture similar to MobileNet v2, but with greater accuracy. Finally, the Efficientnets series achieve higher precision than the existing CNN, reduce the calculation amount and use parameters, and are probably one of the networks with the highest publishing efficiency.
In the existing method, three structure-related parameters (depth, width, image resolution) required for scaling the EfficientNet are determined by grid search, and the model scaling does not change the infrastructure of the model, but determines how much to change the infrastructure according to computing resources.
When scaling the network, the parameters adjusted by the EfficientNet mainly include the depth (d) of the network, the width (w) and the resolution (r) of the training picture, and the whole problem can be defined as follows.
The i-th layer for CNN can be defined as Yi=Fi(Xi) Wherein Y isiTo output the tensor, XiIs the input tensor
Figure BDA0002373358940000011
HiAnd WiIs the spatial dimension, CiIs the number of channels. CNN networks can generally be divided into several stages, each of which has substantially the same structure and can therefore be represented as
Figure BDA0002373358940000021
Wherein
Figure BDA0002373358940000022
Represents FiL is repeated in the i stageiNext, the process is carried out. In general, CNN is designed to find the optimal structure F of some layersiHowever, the focus of EfficientNet isPoint by point, FiThe method is characterized in that the basic structure of the network is unchanged, and the optimal depth (d), width (w) and resolution (r) of the network are found through scaling, so that the model with the highest precision under the current resource configuration is obtained. The main idea is as follows:
maxd,w,raccuracy (N (d, w, r)), the Accuracy of the network is maximized within the following conditions
Figure BDA0002373358940000023
Adjusting network structure
Memory (N) is less than or equal to target memory, and the consumption of memory is less than the target value
FLOPS (N) is less than or equal to targetflops, and consumption of calculated amount is less than target value
Where d, w and r are parameters that determine the scale of the model and the resolution of the image, these three parameters are correlated and scaled uniformly by a complex factor φ in EfficientNet operation as follows
Depth (depth):
Figure BDA0002373358940000024
width (width):
Figure BDA0002373358940000025
resolution (resolution):
Figure BDA0002373358940000026
s.tα·β2·γ2≈2,α≥1,β≥1,γ≥1
phi in EfficientNet is a parameter specified according to a computational resource and is manually determined by a user, and α, gamma is a parameter determined by Grid Search (Grid Search) and controls how a certain amount of computational resources are allocated to the width, depth and resolution of the network, thereby maximizing the accuracy of the network.
The EfficientNet search parameter α, gamma method has two main disadvantages, one is sampling by grid search, and the other is that new network needs to be trained to converge every time the parameter is changed, which makes the parameter search less effective.
Grid search is a relatively traditional parametric search method and is still one of the most applied today. The method is simple in exhaustive search, a user determines possible values of all parameters in advance, a network is trained by using possible structural parameter combinations, and finally the performance of the model is evaluated in a verification set, so that the structural parameter combination used by the model with the optimal performance is needed. The advantage of grid searching is that it is relatively simple to perform and that it is feasible to use grid searching when both the number of parameters and the number of samples are small. However, when the number of parameters is large, it is difficult to determine that the searched combination contains the optimal solution, and a dimensional disaster is easily encountered. For EfficientNet, the number of parameters to be processed is not too many, only 3, so that using grid search is feasible, but not optimal, for two reasons. Firstly, the distribution and sampling interval of each parameter need to be confirmed, and in this task, the three parameters are basically considered to be uniformly distributed between 1 and 2, and the sampling intervals are equidistant, and the step size is 0.1. Before adjusting the hyper-parameters, we need to know the possible values of each parameter, and the final searched result is definitely in the process. But in reality the true "optimum" is not necessarily in this sample space determined in advance, since this step size is not very small. However, if the sampling step size is set too small, it consumes computational resources. Secondly, the grid search is valued equally for each set of samples, i.e. the parameters have to be evenly distributed in the search space, and the step size does not become smaller when the sample values are more likely to be close to "optimal".
The evaluation criteria for the parameters are such that: and generating a new network for each group of parameters, training each network to be converged, and comparing top-1 precision of each network, wherein the parameter used by the network with the highest precision is the desired result. The advantage of this is that it can be guaranteed that the selected network has the highest precision, but the disadvantage is that each network needs to be trained until the precision is not increased, and the computing resource is consumed. In many cases, it can be seen already after training several epochs that the accuracy is not as good as desired. The EfficientNet family was originally developed by the Google team and may be at first time when computing resources were indeed sufficient, but for most of the average users who need to adjust super-parameters, saving computing resources may mean saving time or saving rental node costs.
Disclosure of Invention
In view of this, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a method for searching for a structural parameter of an EfficientNet, including the following steps:
randomly initializing structural parameters in a search space in a preset range;
generating EfficientNet by utilizing the randomly initialized structural parameters;
predicting the final precision of the EfficientNet after a first preset number of epochs are trained;
in response to the predicted final precision being smaller than a threshold value, predicting the final precision of the EfficientNet after a second preset number of epoch trainings, wherein the second preset number is larger than the first preset number;
and stopping training in response to the final precision of the prediction or the re-prediction is not less than the threshold value, reserving and recording the corresponding structural parameters, and returning to the step of randomly initializing the structural parameters.
In some embodiments, further comprising:
and stopping training in response to the final accuracy of the second prediction being smaller than the threshold value, and returning to the step of randomly initializing the structural parameters.
In some embodiments, the step of returning the randomly initialized configuration parameters further comprises:
judging whether the times of randomly initializing the structural parameters reach preset times or whether the search time reaches preset time;
responding to the situation that the times of randomly initializing the structural parameters do not reach the preset times and the search time does not reach the preset time, and returning to the step of randomly initializing the structural parameters;
and stopping searching in response to the fact that the times of randomly initializing the structural parameters reach preset times or the searching time reaches preset time.
In some embodiments, predicting the final precision of the EfficientNet after a first preset number of epoch trainings further comprises:
completely training a plurality of EfficientNet generated by a plurality of groups of randomly initialized structural parameters;
generating a fitting curve by utilizing the complete training results of the EfficientNet according to a maximum likelihood method;
predicting the final precision using the fitted curve.
In some embodiments, generating a fitting curve according to a maximum likelihood method using the complete training results of the number of efficientnets further comprises:
and determining the weight of each sub-function in the fitting curve by using the complete training results of the EfficientNet.
In some embodiments, predicting the final accuracy using the fitted curve further comprises:
determining the parameter of each subfunction according to a training result obtained by training by utilizing a first preset number of epochs;
predicting the final precision by using the parameters of each sub-function and the weight of each sub-function.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a system for searching for structural parameters of EfficientNet, including:
a search module configured to randomly initialize a structural parameter in a search space of a preset range;
a generation module configured to generate EfficientNet using the randomly initialized structural parameters;
a first prediction module configured to predict a final precision of the EfficientNet after a first preset number of epochs training;
a second preset module configured to predict a final precision of the EfficientNet after a second preset number of epoch trainings again in response to the predicted final precision being less than a threshold, wherein the second preset number is greater than the first preset number;
a processing module configured to stop training, retain and record corresponding structural parameters, and return to the step of randomly initializing structural parameters in response to the predicted or re-predicted final accuracy being not less than a threshold.
In some embodiments, the processing module is further configured to:
and stopping training in response to the final accuracy of the second prediction being smaller than the threshold value, and returning to the step of randomly initializing the structural parameters.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor executes the program to perform any of the above-described steps of the method for searching for structural parameters of EfficientNet.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of any one of the methods for searching for structural parameters of EfficientNet described above.
The method has the advantages that the random search is used for replacing the original grid search method, the value of the structural parameter α and gamma in EfficientNet is determined, the hyperreference search is more efficient and accurate, the requirement on the prior knowledge of a user is lowered, in addition, the method for predicting the final precision in advance is not needed, the neural network determined by combining all the structural parameters is not required to be trained to be convergent, and therefore the consumption of computing resources is greatly reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for searching for structural parameters of EfficientNet according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a search system for structural parameters of EfficientNet according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
According to an aspect of the present invention, an embodiment of the present invention provides a method for searching structural parameters of an EfficientNet, as shown in fig. 1, which may include the steps of: s1, randomly initializing structural parameters in a search space in a preset range; s2, generating EfficientNet by using the randomly initialized structural parameters; s3, predicting the final precision of the EfficientNet after a first preset number of epochs are trained; s4, responding to the predicted final precision being smaller than a threshold value, predicting the final precision of the EfficientNet after a second preset number of epoch training again, wherein the second preset number is larger than the first preset number; and S5, in response to the final precision of the prediction or the prediction again not less than the threshold value, stopping training, reserving and recording the corresponding structural parameters, and returning to the step of randomly initializing the structural parameters.
In addition, by the method for predicting the final precision in advance, the neural network determined by combining all the structural parameters does not need to be trained to be convergent, so that the consumption of computing resources is greatly reduced.
In some embodiments, the method further comprises:
and stopping training in response to the final accuracy of the second prediction being smaller than the threshold value, and returning to the step of randomly initializing the structural parameters.
Specifically, if the prediction result does not reach the threshold, a plurality of epochs can be used again to predict again to judge whether the final result reaches the threshold, and if the final result still cannot reach the threshold, the training is stopped. If the threshold can be reached, the results are retained and recorded.
In some embodiments, the step of returning the randomly initialized configuration parameters further comprises:
judging whether the times of randomly initializing the structural parameters reach preset times or whether the search time reaches preset time;
responding to the situation that the times of randomly initializing the structural parameters do not reach the preset times and the search time does not reach the preset time, and returning to the step of randomly initializing the structural parameters;
and stopping searching in response to the fact that the times of randomly initializing the structural parameters reach preset times or the searching time reaches preset time.
Specifically, α, γ may be randomly initialized between the search spaces [1, 2], 1.91 ≤ α · β · γ ≤ 2.09 may be set, and the search task time may be defined as 200 hours, with the maximum number of sampling times being 100, so that the search is stopped when one of the two conditions is satisfied, i.e., in response to the number of times of randomly initializing the structural parameters reaching the preset number or the search time reaching the preset time, the search is stopped, and if neither of the two conditions is satisfied, the randomly initializing the structural parameters is continued.
In performing the random initialization of the structural parameters, a new point (Φ _ (i +1), β _ (i +1), γ _ (i +1)) may be taken in the given search space around the current position (Φ _ i, β _ i, γ _ i), if f (Φ _ (i +1), β _ (i +1), γ _ (i +1)) > f (Φ _ i, β _ i, γ _ i), then (Φ _ (i +1), β _ (i +1), γ _ (i +1)) (Φ _ i, β _ i, γ _ i) is made, and then the next sampling position is selected.
In some embodiments, predicting the final precision of the EfficientNet after a first preset number of epoch trainings further comprises:
completely training a plurality of EfficientNet generated by a plurality of groups of randomly initialized structural parameters;
generating a fitting curve by utilizing the complete training results of the EfficientNet according to a maximum likelihood method;
predicting the final precision using the fitted curve.
In some embodiments, generating a fitting curve according to a maximum likelihood method using the complete training results of the number of efficientnets further comprises:
and determining the weight of each sub-function in the fitting curve by using the complete training results of the EfficientNet.
In some embodiments, predicting the final accuracy using the fitted curve further comprises:
determining the parameter of each subfunction according to a training result obtained by training by utilizing a first preset number of epochs;
predicting the final precision by using the parameters of each sub-function and the weight of each sub-function.
Specifically, the evaluation standard of the structural parameters is the training precision that EfficientNet can achieve, and the higher the model precision, the better the set of structural parameters is. However, it takes a long time to train the neural network completely until the progress does not rise any more, so it is desirable to decide in advance whether training is needed to continue. The early stop (early stop) strategy here is similar to that in neural network training, but the purpose is somewhat different: early stopping in neural networks is to prevent overfitting (overfitting) in the network; the early stopping in the structure parameter training is to save the computing resource, namely the accuracy of the final result is not influenced without the early stopping, and the whole training process is faster.
In the embodiment of the invention, in the structural parameter task of EfficientNet, according to the principle of simulating artificial early stopping training, a statistical model is used for fitting a curve according to the training results of a plurality of times before, and the subsequent neural network model does not need to be trained to be converged completely, but can select the accuracy of the training after the prediction by using the curve after the training of a plurality of epochs and determine whether to stop the training in advance. The basic idea is that when training of the neural network x corresponding to a certain set of structural parameters is performed to the S-th step (i.e., performing a preset number of epochs), a final result is fitted by using a variation trend of the existing result (precision), and if the result is inferior to the worst result (or a set threshold) in the completed training, the training of the set of structural parameters is stopped. The curve used for "fitting" here may be a monotonically increasing and saturating function f of 11 typesk(x) Obtained by linear weighted mixing, the curve can be expressed as:
Figure BDA0002373358940000091
plus a Gaussian distribution of noise
Figure BDA0002373358940000092
New parameter space ξ ═ (w)1,…,wK1,…,θK2) Where w is the weight of each function, θ is the parameter in each function, σ2Is noise. The initial value of the weight w is 1/K; the parameter θ is determined by maximum likelihood estimation, and the sampling is performed by Markov Chain Monte Carlo Inference (MCMC: Markov Chain Monte Carlo Inference); here maximum likelihood estimation is used for the noise sigma.
It should be noted that the weights of the 11 functions for fitting the curve may be determined from the training results of the initial sets of structural parameters, i.e., the weights are determined when fitting the curve, and the parameters (e.g., a, b, c) of each function are determined according to the training results of the structural parameters of each set.
In some embodiments, 11 functions f for fitting a curvek(x) The following were used:
Figure BDA0002373358940000101
f2(x)=c-ax
f3(x)=log(alog(x)+b)
Figure BDA0002373358940000102
Figure BDA0002373358940000103
Figure BDA0002373358940000108
Figure BDA0002373358940000104
f8(x)=c-(ax+b)
Figure BDA0002373358940000105
Figure BDA0002373358940000106
Figure BDA0002373358940000107
the algorithm mainly comprises the following two steps when judging whether to execute early termination:
specifically, the first several neural networks may be trained completely until the precision does not increase, then the weight value of each function is determined according to the result generated by the training in combination with the MCMC sampling, and when the MCMC sampling is performed, the precision corresponding to different epochs may be randomly sampled, and then the weight of each function is determined according to the maximum likelihood method.
Predicting, after obtaining a complete fitting curve, predicting the precision on a specific epoch by using the fitting curve, if the prediction precision does not reach a threshold value, judging whether the threshold value is reached again after a plurality of epochs, and if the threshold value is not reached, stopping training. If the threshold is reached, the results are retained and recorded.
It should be noted that, before curve fitting, the parameters that need to be determined in advance by the user include at least: how many complete trainings to determine a fitted curve; deciding whether to use the early stop threshold; the number of intervals between the two evaluations, etc. The remaining parameters associated with the computing resources may also be user dependent.
Compared with the scheme of simply using grid search, the scheme provided by the invention can greatly reduce the training time of the structural parameters by using a method of combining random search and early stopping on the premise of hardly reducing the training precision. In the embodiment of the invention, only 3 structural parameters need to be searched, and when the quantity of the structural parameters needing to be processed is increased or the grid is set to be thinner, the effect of saving computing resources is more obvious.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a search system 400 for structural parameters of EfficientNet, as shown in fig. 2, including:
a search module 401, wherein the search module 401 is configured to randomly initialize a structural parameter in a search space in a preset range;
a generating module 402, wherein the generating module 402 is configured to generate EfficientNet by using the randomly initialized structural parameters;
a first prediction module 403, the first prediction module 403 being configured to predict a final precision of the EfficientNet after a first preset number of epoch trainings;
a second pre-set module 404, configured to re-predict a final precision of the EfficientNet after a second preset number of epoch trains in response to the predicted final precision being less than a threshold, wherein the second preset number is greater than the first preset number;
a processing module 405, said processing module 405 configured to stop training, retain and record the corresponding structural parameters, and return to the step of randomly initializing the structural parameters, in response to the final accuracy of the prediction or the prediction again being not less than the threshold.
In some embodiments, the processing module 405 is further configured to:
and stopping training in response to the final accuracy of the second prediction being smaller than the threshold value, and returning to the step of randomly initializing the structural parameters.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:
at least one processor 520; and
a memory 510, the memory 510 storing a computer program 511 executable on the processor, the processor 520 executing the program to perform the steps of any one of the above methods for searching for structural parameters of EfficientNet.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the steps of the method for searching for structural parameters of EfficientNet as any one of the above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct related hardware to implement the methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
In addition, the apparatuses, devices, and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal device, such as a server, and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed by the embodiment of the invention can be applied to any one of the electronic terminal devices in the form of electronic hardware, computer software or a combination of the electronic hardware and the computer software.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A searching method of structural parameters of EfficientNet is characterized by comprising the following steps:
randomly initializing structural parameters in a search space in a preset range;
generating EfficientNet by utilizing the randomly initialized structural parameters;
predicting the final precision of the EfficientNet after a first preset number of epochs are trained;
in response to the predicted final precision being smaller than a threshold, predicting the final precision of the efficientNet trained by epoch after a second preset number again, wherein the second preset number is larger than the first preset number;
and stopping training in response to the final precision of the prediction or the re-prediction is not less than the threshold value, reserving and recording the corresponding structural parameters, and returning to the step of randomly initializing the structural parameters.
2. The method of claim 1, further comprising:
and stopping training in response to the final accuracy of the second prediction being smaller than the threshold value, and returning to the step of randomly initializing the structural parameters.
3. The method of claim 1 or 2, wherein the step of returning the randomly initialized configuration parameters further comprises:
judging whether the times of randomly initializing the structural parameters reach preset times or whether the search time reaches preset time;
responding to the situation that the times of randomly initializing the structural parameters do not reach the preset times and the search time does not reach the preset time, and returning to the step of randomly initializing the structural parameters;
and stopping searching in response to the fact that the times of randomly initializing the structural parameters reach preset times or the searching time reaches preset time.
4. The method of claim 1, wherein predicting a final precision of the EfficientNet after a first preset number of epoch trainings, further comprises:
completely training a plurality of EfficientNet generated by a plurality of groups of randomly initialized structural parameters;
generating a fitting curve by utilizing the complete training results of the EfficientNet according to a maximum likelihood method;
predicting the final precision using the fitted curve.
5. The method of claim 4, wherein generating a fitted curve according to a maximum likelihood method using the number of complete training results of EfficientNet further comprises:
and determining the weight of each sub-function in the fitting curve by using the complete training results of the EfficientNet.
6. The method of claim 5, wherein predicting the final accuracy using the fitted curve further comprises:
determining the parameter of each subfunction according to a training result obtained by training by utilizing a first preset number of epochs;
predicting the final precision by using the parameters of each sub-function and the weight of each sub-function.
7. A searching system for structural parameters of EfficientNet, comprising:
a search module configured to randomly initialize a structural parameter in a search space of a preset range;
a generation module configured to generate EfficientNet using the randomly initialized structural parameters;
a first prediction module configured to predict a final precision of the EfficientNet after a first preset number of epochs training;
a second preset module configured to predict a final precision of the EfficientNet after a second preset number of epoch trainings again in response to the predicted final precision being less than a threshold, wherein the second preset number is greater than the first preset number;
a processing module configured to stop training, retain and record corresponding structural parameters, and return to the step of randomly initializing structural parameters in response to the predicted or re-predicted final accuracy being not less than a threshold.
8. The system of claim 7, wherein the processing module is further configured to:
and stopping training in response to the final accuracy of the second prediction being smaller than the threshold value, and returning to the step of randomly initializing the structural parameters.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, characterized in that the processor executes the program to perform the steps of the method according to any of claims 1-6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1-6.
CN202010057657.7A 2020-01-19 2020-01-19 Method, system, equipment and medium for searching structural parameters of EfficientNet Pending CN111260035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010057657.7A CN111260035A (en) 2020-01-19 2020-01-19 Method, system, equipment and medium for searching structural parameters of EfficientNet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010057657.7A CN111260035A (en) 2020-01-19 2020-01-19 Method, system, equipment and medium for searching structural parameters of EfficientNet

Publications (1)

Publication Number Publication Date
CN111260035A true CN111260035A (en) 2020-06-09

Family

ID=70953044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010057657.7A Pending CN111260035A (en) 2020-01-19 2020-01-19 Method, system, equipment and medium for searching structural parameters of EfficientNet

Country Status (1)

Country Link
CN (1) CN111260035A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668702A (en) * 2021-01-15 2021-04-16 北京格灵深瞳信息技术股份有限公司 Fixed-point parameter optimization method, system, terminal and storage medium
CN113128680A (en) * 2021-03-12 2021-07-16 山东英信计算机技术有限公司 Neural network training method, system, device and medium
CN113282926A (en) * 2021-05-25 2021-08-20 贵州师范大学 Malicious software classification method based on three-channel image

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668702A (en) * 2021-01-15 2021-04-16 北京格灵深瞳信息技术股份有限公司 Fixed-point parameter optimization method, system, terminal and storage medium
CN112668702B (en) * 2021-01-15 2023-09-19 北京格灵深瞳信息技术股份有限公司 Fixed-point parameter optimization method, system, terminal and storage medium
CN113128680A (en) * 2021-03-12 2021-07-16 山东英信计算机技术有限公司 Neural network training method, system, device and medium
CN113128680B (en) * 2021-03-12 2022-06-10 山东英信计算机技术有限公司 Neural network training method, system, device and medium
CN113282926A (en) * 2021-05-25 2021-08-20 贵州师范大学 Malicious software classification method based on three-channel image

Similar Documents

Publication Publication Date Title
CN111260035A (en) Method, system, equipment and medium for searching structural parameters of EfficientNet
WO2017219991A1 (en) Optimization method and apparatus suitable for model of pattern recognition, and terminal device
CN112381216B (en) Training and predicting method and device for mixed graph neural network model
CN110235149B (en) Neural plot control
CN113420880B (en) Network model training method and device, electronic equipment and readable storage medium
CN113344170B (en) Neural network weight matrix adjustment method, write-in control method and related device
WO2021052140A1 (en) Anticipatory learning method and system oriented towards short-term time series prediction
Liu et al. An activity-list-based nested partitions algorithm for resource-constrained project scheduling
CN116822651A (en) Large model parameter fine adjustment method, device, equipment and medium based on incremental learning
CN112069294A (en) Mathematical problem processing method, device, equipment and storage medium
Li et al. A compression pipeline for one-stage object detection model
KR20220097329A (en) Method and algorithm of deep learning network quantization for variable precision
CN114049530A (en) Hybrid precision neural network quantization method, device and equipment
CN105740916B (en) Characteristics of image coding method and device
CN117113174A (en) Model training method and device, storage medium and electronic equipment
CN112749557A (en) Text processing model construction method and text processing method
JP2024510634A (en) Learning to transform sensitive data using variable distribution storage
CN115545168A (en) Dynamic QoS prediction method and system based on attention mechanism and recurrent neural network
CN115496162A (en) Model training method, device and equipment
CN115392426A (en) Self-organizing migration service processing method, device and equipment
CN114862003A (en) Request quantity prediction method and device, electronic equipment and storage medium
CN117219190A (en) Training method, device, equipment, medium and program product of molecular generation model
US20230068381A1 (en) Method and electronic device for quantizing dnn model
CN117577097A (en) Model training method, device, electronic equipment and medium
Malaysia Mobile Intelligent Web Pre-fetching Scheme for Cloud Computing Services in Industrial Revolution 4.0

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200609

RJ01 Rejection of invention patent application after publication