CN111667056B - Method and apparatus for searching model structures - Google Patents

Method and apparatus for searching model structures Download PDF

Info

Publication number
CN111667056B
CN111667056B CN202010503202.3A CN202010503202A CN111667056B CN 111667056 B CN111667056 B CN 111667056B CN 202010503202 A CN202010503202 A CN 202010503202A CN 111667056 B CN111667056 B CN 111667056B
Authority
CN
China
Prior art keywords
model structure
candidate
trained
classification threshold
searching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010503202.3A
Other languages
Chinese (zh)
Other versions
CN111667056A (en
Inventor
希滕
张刚
温圣召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010503202.3A priority Critical patent/CN111667056B/en
Publication of CN111667056A publication Critical patent/CN111667056A/en
Application granted granted Critical
Publication of CN111667056B publication Critical patent/CN111667056B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The application discloses a method and a device for searching a model structure, and relates to the technical fields of artificial intelligence, deep learning and image processing. The method comprises the following steps: obtaining a classification threshold value of a model structure to be replaced under at least one preset recall rate; determining a search space for the model structure, initializing a model structure generator, and iterating the steps of: searching out candidate model structures in a search space by using a model structure generator, training the candidate model structures and obtaining classification threshold values of the trained candidate model structures under each preset recall rate; generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and updating a model structure generator based on the feedback information before executing the next iteration; and stopping iteration when the model structure generator reaches a preset convergence condition, and determining the candidate model structure in the current iteration operation as a target model structure. By adopting the method, the accuracy of the search model structure can be improved.

Description

Method and apparatus for searching model structures
Technical Field
Embodiments of the present disclosure relate to the field of computer technology, and in particular, to the field of artificial intelligence, deep learning, and image processing technologies, and more particularly, to a method and apparatus for searching a model structure.
Background
Deep learning techniques have met with great success in many directions. In the deep learning technique, the quality of the model structure (i.e., the structure of the neural network) has a very important influence on the effect of the final model. However, the artificial design model structure requires a designer to have a very rich experience and to search for various combinations, and conventional random searching is hardly feasible because many network parameters can produce explosive combinations. Therefore, in recent years, a neural network architecture search technique (Neural Architecture Search, abbreviated as NAS) has become a research hotspot, which utilizes an algorithm to automatically search for an optimal model structure instead of a cumbersome manual operation.
The existing NAS-based model structure automatic searching method has the problem of low accuracy of the model structure searched out.
Disclosure of Invention
A method, apparatus, electronic device, and computer-readable storage medium for searching a model structure are provided.
According to a first aspect, there is provided a method for searching a model structure, the method comprising:
obtaining a classification threshold of the model structure to be replaced under at least one preset recall rate, wherein the classification threshold comprises: the to-be-replaced model structure maps the characteristics of the to-be-classified data to threshold values adopted by the corresponding categories;
Determining a search space of a model structure, initializing a model structure generator, and searching out a target model structure through multiple rounds of iterative operation; the iterative operation includes: searching a candidate model structure in a search space by using a model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate; generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and updating the model structure generator based on the feedback information before executing the next iteration operation; and determining the candidate model structure in the current iteration operation as a target model structure in response to determining that the model structure generator reaches a preset convergence condition.
According to a second aspect, there is provided an apparatus for searching a model structure, the apparatus comprising:
the obtaining unit is configured to obtain a classification threshold value of the model structure to be replaced under at least one preset recall rate, wherein the classification threshold value comprises: the to-be-replaced model structure maps the characteristics of the to-be-classified data to threshold values adopted by the corresponding categories; the searching unit is configured to determine a searching space of the model structure, initialize a model structure generator and search out a target model structure through multiple iterative operations; the search unit includes: a computing unit configured to perform the following steps in the iterative operation: searching a candidate model structure in a search space by using a model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate; a generating unit configured to perform the following steps in the iterative operation: generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and updating the model structure generator based on the feedback information before executing the next iteration operation; a determining unit configured to perform the following steps in the iterative operation: and determining the candidate model structure in the current iteration operation as a target model structure in response to determining that the model structure generator reaches a preset convergence condition.
According to a third aspect, embodiments of the present disclosure provide an electronic device comprising: one or more processors to: and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for searching a model structure as provided in the first aspect.
According to a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method for searching a model structure provided by the first aspect.
According to the method and the device for searching the model structure, feedback information is generated according to the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall rate, and parameters of a model structure generator are iteratively updated according to the feedback information, so that the performance of the finally searched target model structure accords with expectations, and the accuracy of searching the model structure is improved.
The technology solves the problem of low accuracy of the model structure searched by the automatic model structure searching method.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which embodiments of the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a method for searching a model structure in accordance with the present application;
FIG. 3 is a flow chart of another embodiment of a method for searching a model structure in accordance with the present application;
FIG. 4 is a schematic diagram of an embodiment of an apparatus for searching a model structure in accordance with the present application;
fig. 5 is a block diagram of an electronic device for implementing a method for searching a model structure in accordance with an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of the method for searching a model structure or apparatus for searching a model structure of the present application may be applied.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various client applications, such as an image classification application, an information classification application, a search class application, a shopping class application, a financial class application, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting receipt of server messages, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, various electronic devices are possible. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.
The server 105 may be a server providing background services for applications running on the terminal devices 101, 102, 103 or may be a server providing support for a neural network model running on the terminal devices 101, 102, 103. The server 105 may acquire data to be processed from the terminal devices 101, 102, 103, process the data to be processed using the neural network model, and return the processing results to the terminal devices 101, 102, 103. The server 105 may also train a neural network model performing various deep learning tasks (e.g., image processing, speech recognition, text translation, etc.) using the image data, voice data, text data, etc. acquired from the terminal devices 101, 102, 103 or databases, and transmit the trained neural network model to the terminal devices 101, 102, 103. Alternatively, the server 105 may automatically search out a neural network model structure with good performance based on the deep learning task to be performed, and train the searched out neural network model structure based on the media data.
It should be noted that the method for searching a model structure provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for searching a model structure is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a method for searching a model structure according to the present disclosure is shown. A method for searching a model structure, comprising the steps of:
step 201, obtaining a classification threshold of the model structure to be replaced under at least one preset recall rate.
Wherein the classification threshold comprises: the to-be-replaced model structure maps the characteristics of the to-be-classified data to threshold values adopted by the corresponding categories.
In this embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for searching a model structure may first acquire a model structure to be replaced, which may be a model structure for realizing functions of object classification, object recognition, object verification, and the like. In practice, the model structure to be replaced may be a neural network model running on-line, for example, a neural network model supporting a specific function in various end applications, such as an image classification model, a speech recognition model, and so on.
The execution body may further obtain a classification threshold of the model structure to be replaced under at least one preset Recall rate, where the Recall rate (Recall Ratio) refers to a Ratio of a relevant information amount detected from the database to a total relevant information amount, and is an index for measuring a success rate of a certain retrieval system in detecting relevant data from the data set, and in a classification system including positive and negative categories, the Recall rate is a Ratio of samples correctly identified as positive categories in all positive category samples; the classification threshold value refers to a classification threshold value adopted when the model structure to be replaced maps the characteristics of the data to be classified to the corresponding class to classify the data. The above-mentioned classification threshold may be preset for determining the category to which the object to be classified belongs based on the features extracted from it. And in the running process of the model structure to be replaced, classifying the objects such as images, voices, texts and the like to be classified. The classification threshold may specifically be a probability threshold that the target to be classified is determined to be a certain class according to the extracted feature, and the classification threshold has an association relationship with the recall ratio.
In this embodiment, the server may obtain classification thresholds of the model structure to be replaced under a plurality of different recall rates, so as to optimize network parameters of the model structure generator (i.e., the neural network used for searching the model structure) according to the plurality of classification thresholds of the model structure to be replaced under the different recall rates, thereby improving the searching accuracy of the neural network.
Step 202, determining a search space of the model structure, initializing a model structure generator, and searching out a target model structure through multiple iterative operations.
In this embodiment, a search space of model structures that can be used to perform classification tasks is determined, the search space of model structures comprising a plurality of candidate model structures, which may be composed of basic building elements of the model structures. The model structure generator may sample the basic building units from the search space of the model structure, and form a complete candidate model structure by stacking and concatenating the basic building units. The model structure generator is configured to generate a model structure based on the search space and the external feedback information, which may be implemented as a recurrent neural network, a convolutional neural network, a reinforcement learning algorithm, an evolutionary algorithm, a simulated annealing algorithm, and so on. Network parameters of the model structure generator, which refer to the connection weights between neurons of different levels in the neural network, or structure sampling policies, which refer to the method of sampling model structures or basic building elements from the search space, may be initialized. And then, starting to perform multiple iterative operations until the iterative operations are finished, and searching out the target model structure.
The iterative operation includes steps 2021, 2022:
in step 2021, a candidate model structure is searched in the search space by using a model structure generator, the candidate model structure is trained, and a classification threshold of the trained candidate model structure under each preset recall rate is obtained.
In each iteration, candidate model structures may be searched in the search space using the current model structure generator. In particular, model structure encoding rules may be predefined and the model structure encoder may generate sequences that characterize candidate model structures. For example, when the model structure generator is implemented as a recurrent neural network, the search space may be encoded as input data to the recurrent neural network, and candidate model structures may be obtained by decoding the sequence output from the recurrent neural network according to the model structure encoding rules described above. Alternatively, when the model structure generator is implemented as an augmentation algorithm, it may generate a state sequence that is decoded according to the model structure rules described above to obtain candidate model structures.
The searched candidate model structures may then be trained using the training data until the candidate model structures converge. And then, acquiring a classification threshold value of the converged candidate model structure under a preset recall rate, wherein the recall rate is the same as the recall rate used for calculating the classification threshold value of the model structure to be replaced under the preset recall rate.
The training data may be a sample data set obtained by the server through the terminal device, may be a training data set obtained by the server reading a local storage or a knowledge base, or may be a training data set obtained through an internet or other means. In this embodiment, the training data may be training data of a classification task, such as image data in a target recognition or authentication task. The basis for judging whether the candidate model structure converges may be: judging whether a preset performance convergence index reaches a preset convergence threshold, for example, if the preset classification accuracy is the convergence index and 90% is the convergence threshold, and judging that the candidate model structure converges when the classification accuracy of the candidate model structure reaches 90%.
In this embodiment, whether the training of the candidate model is finished may also be determined according to the training time, that is, the candidate model structure is searched in the search space by using the model structure generator, and the candidate model structure is trained by using the training data until the training time reaches the preset training time, so as to obtain the classification threshold of the trained candidate model structure under the preset recall rate.
In step 2022, feedback information is generated according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced at the same preset recall rate, and the model structure generator is updated based on the feedback information before executing the next iteration operation.
In this embodiment, feedback information is generated according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and parameters of the model structure generator are updated in executing the next iteration operation based on the feedback information. The difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate can be a mathematical calculation result between the classification threshold values of the candidate model structure and the model structure to be replaced, for example, a difference value between the classification threshold values of the candidate model structure and the model structure to be replaced, and a quotient between the classification threshold values of the candidate model structure and the model structure to be replaced; the difference threshold value can also be preset, and when the mathematical calculation result between the two classification threshold values is smaller than the difference threshold value, a binary symbol is used for representing whether the two classification threshold values have differences or not, so as to represent the difference condition between the two classification threshold values; other machine-readable expressions that can characterize the differences are also possible. The model structure generator may be updated based on the feedback information. In particular, the model structure generator may update parameters or update structure sampling policies based on feedback information. For example, when the model structure generator is a recurrent neural network, the parameters of the model structure generator can be updated by adopting a gradient descent method based on the feedback information for back propagation; when the model structure generator is a reinforcement learning algorithm, the feedback information is used as a feedback value (reward), and the model structure generator regenerates the state sequence based on the feedback value.
In response to determining that the model structure generator reaches the preset convergence condition, the candidate model structure in the current iteration operation is determined as the target model structure, step 2023.
In this embodiment, when the model structure generator reaches a preset convergence condition, the iterative operation is ended, and the candidate model structure searched in the last iteration operation is determined as the target model structure. The model structure generator reaching the preset convergence condition may be that the iteration number of the model structure generator reaches the preset iteration number, the iteration time of the model structure generator reaches the preset iteration time, or the performance of the candidate model structure searched out by the model structure generator reaches the expected performance.
According to the method for searching the model structure, feedback information is generated according to the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall rate, and network parameters of the model structure generator are updated in an iterative mode according to the feedback information, so that the difference between the model structure to be replaced and the characteristics extracted by the candidate model structure from the classification object can be fed back to the model structure generator, the model structure generator generates a candidate model structure consistent with the characteristics extracted by the model structure to be replaced, and the finally searched threshold value of the target model structure under the preset recall rate is close to the threshold value of the model structure to be replaced under the same recall rate. Therefore, when the model structure to be replaced is replaced by the searched target model structure, the classification threshold value does not need to be redesigned or changed, and the complexity of replacing the on-line model structure is reduced. And feedback information is generated through the difference of classification thresholds of the candidate model structure and the model structure to be replaced under the same recall rate, and network parameters of a model structure generator are iteratively updated according to the feedback information, so that retraining of the model structure to be replaced is not needed, hardware operation resources and memory resources consumed by training can be reduced, and hardware equipment resources are saved.
The method of the present embodiment can determine a target model structure for constructing a neural network model that performs an image processing task, since image data is generally converted into matrix data in the processing of the neural network model, a large number of matrix operations are involved. The training of the target model structure for constructing the neural network model for executing the image processing task requires a great deal of time cost and hardware loss cost, and the embodiment generates feedback information by utilizing the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall rate, and iteratively updates the network parameters of the model structure generator according to the feedback information, so that the time cost for retraining the target model structure and the consumption of hardware resources can be saved, and the efficiency of constructing the neural network model for executing the image processing task is improved.
With further reference to FIG. 3, a flow 300 of yet another embodiment of a method for searching a model structure is shown. The flow 300 of the method for searching a model structure comprises the steps of:
step 301, obtaining a classification threshold of a model structure to be replaced under at least one preset recall rate, wherein the classification threshold comprises: the to-be-replaced model structure maps the characteristics of the to-be-classified data to threshold values adopted by the corresponding categories.
Step 302, determining a search space of the model structure, initializing a model structure generator, and searching out the target model structure through multiple iterative operations.
The iterative operation includes steps 3021 and 3022:
step 3021, searching a candidate model structure in the search space by using a model structure generator, training the candidate model structure, and obtaining classification thresholds of the trained candidate model structure under each preset recall rate.
And 3022, determining performance information of the trained candidate model structure, and generating feedback information according to the difference between the classification thresholds of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure, wherein the model structure generator is updated based on the feedback information before executing the next iteration operation.
In this embodiment, first, performance information of a trained candidate model structure and a difference between classification thresholds of the trained candidate model structure and a model structure to be replaced under the same preset recall rate are obtained, where the difference between classification thresholds of the trained candidate model structure and the model structure to be replaced under the same preset recall rate may be a difference between the classification thresholds of the two. And generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure, and updating the network parameters of the model structure generator in the next iteration operation according to the feedback information. The feedback information is a performance index used for adjusting network parameters in the neural network, and the performance of the neural network after the iteration is finished can be enabled to meet the expectations by iteratively adjusting the network parameters of the neural network.
According to the method and the device, feedback information is generated according to the difference between classification thresholds of the trained candidate model structure and the candidate model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure to be used for adjusting parameters of the neural network, so that the searched target model structure can replace the candidate model structure to execute classification tasks without modifying classification strategies, and meanwhile, the performance of the searched target model structure can be enabled to meet expectations, and therefore searching accuracy is improved.
In some optional implementations of the present embodiment, the feedback information may also be generated by: generating a first feedback value according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate; generating a second feedback value according to the performance information of the trained candidate model structure; and taking the weighted sum of the first feedback value and the second feedback value as feedback information.
In this embodiment, a first feedback value is first generated according to a difference between classification thresholds of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and a second feedback value is generated according to performance information of the trained candidate model structure, and then a weighted sum of the first feedback value and the second feedback value is used as feedback information. For example, if the first feedback value is 4, the corresponding feedback weight is 0.9, the second feedback value is 12, and the corresponding feedback weight is 0.1, then the weighted sum of the first feedback value and the second feedback value is 4.8 as feedback information. According to the method, the device and the system, the influence of the first feedback value and the second feedback value on the feedback information is adjusted by introducing the weight, so that the final target model structure is more focused on the performance or the strategy similarity with the model structure to be replaced, the search result is more in line with the user requirement, and the search accuracy is further improved.
Optionally, in this embodiment, the weight of the first feedback value is smaller than the weight of the second feedback value. When the weight of the first feedback value is smaller than that of the second feedback value, the searching strategy of the model structure generator is focused on searching out the model structure with good performance, and the performance of the finally searched target model structure can be improved.
In response to determining that the model structure generator reaches the preset convergence condition, determining the candidate model structure in the current iteration operation as the target model structure 3023.
Step 301, step 302, step 3021, and step 3023 in this embodiment are identical to step 201, step 2021, step 203, and step 2023 in the foregoing embodiments, respectively, and specific implementation manners of step 301, step 302, step 303, and step 305 may refer to descriptions of corresponding steps in the foregoing embodiments, which are not repeated herein.
In some alternative implementations of the embodiments described above in connection with fig. 2 and 3, the method for searching for model structures further includes: replacing the model structure to be replaced with a target model structure; and classifying the data to be classified by using the target model structure.
According to the method for searching the model structure, feedback information is generated according to the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall, and parameters of a model structure generator are updated in an iterative mode according to the feedback information, so that the performance of the finally searched target model structure accords with the expectation, and the accuracy of searching the model structure is improved.
In some optional implementations of the foregoing embodiments, the iterative operation may further include: and in response to determining that the model structure generator does not reach a preset convergence condition, performing a next iteration operation based on the updated model structure generator. In this way, the model structure generator is gradually optimized through repeated execution of iterative operation, so that the model structure generator generates a target model structure with smaller feature loss between the model structure generator and the model structure to be replaced, or a target model structure with better performance and smaller feature loss between the model structure generator and the model structure to be replaced is searched, and automatic optimization of the model structure is realized.
With further reference to fig. 4, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an apparatus for searching a model structure, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the apparatus 400 for searching a model structure of the present embodiment includes: an acquisition unit 401, a search unit 402, a calculation unit 4021, a generation unit 4022, and a determination unit 4023. Wherein the obtaining unit 401 is configured to obtain a classification threshold of the model structure to be replaced under at least one preset recall, where the classification threshold includes: the to-be-replaced model structure maps the characteristics of the to-be-classified data to threshold values adopted by the corresponding categories; a search unit 402 configured to determine a search space of the model structure, initialize a model structure generator, and search out a target model structure through a plurality of iterative operations; the search unit 402 includes: the calculation unit 4021 is configured to perform the following steps in the iterative operation: searching a candidate model structure in a search space by using a model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate; the search unit 402 includes: the generating unit 4022 is configured to perform the following steps in the iterative operation: generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and updating the model structure generator based on the feedback information before executing the next iteration operation; the determination unit 4023 is configured to perform the following steps in the iterative operation: and determining the candidate model structure in the current iteration operation as a target model structure in response to determining that the model structure generator reaches a preset convergence condition.
According to the device for searching the model structure, feedback information is generated according to the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall, and parameters of a model structure generator are iteratively updated according to the feedback information, so that the performance of the finally searched target model structure accords with expectations, and the searching accuracy of the model structure is improved.
In some embodiments, the search unit 402 further comprises: a test unit configured to perform the following steps in the iterative operation: determining performance information of the trained candidate model structure; the generating unit 4022 includes: the information generation module is configured to generate feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure.
In some embodiments, the information generation module comprises: the first generation module is configured to generate a first feedback value according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate; the second generation module is configured to generate a second feedback value according to the performance information of the trained candidate model structure; and the fusion module is configured to take the weighted sum of the first feedback value and the second feedback value as feedback information.
In some embodiments, the weight of the first feedback value is less than the weight of the second feedback value.
In some embodiments, the means for searching for the model structure further comprises: a replacing unit configured to replace the model structure to be replaced with the target model structure; and the processing unit is configured to classify the data to be classified by utilizing the target model structure.
The units in the above-described apparatus 400 correspond to the steps in the method described with reference to fig. 2 and 4. The operations, features and technical effects achieved thereby described above with respect to the method for searching for model structures are equally applicable to the apparatus 400 and the units contained therein, and are not described in detail herein.
According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.
As shown in fig. 5, is a block diagram of an electronic device for a method of searching a model structure according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.
Memory 502 is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for searching a model structure provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for searching a model structure provided by the present application.
The memory 502 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the method for searching a model structure in the embodiment of the present application (e.g., the acquisition unit 401, the search unit 402, the calculation unit 4021, the generation unit 4022, and the determination unit 4023 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e., implements the method for searching for a model structure in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the electronic device for searching the model structure, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 optionally includes memory remotely located with respect to processor 501, which may be connected to the electronic device for searching the model structure via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for the method of searching a model structure may further include: input means 503, output means 504 and bus 505. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus 505 or otherwise, in fig. 5 by way of example by bus 505.
The input device 503 may receive entered numeric or character information and generate key signal inputs related to user settings and function control of the electronic device used to search for the model structure, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (11)

1. A method for searching a model structure, wherein the model structure is used to perform classification tasks, comprising:
obtaining a classification threshold of a model structure to be replaced under at least one preset recall rate, wherein the classification threshold comprises: the to-be-replaced model structure maps the characteristics of to-be-classified data to threshold values adopted by corresponding categories, wherein the to-be-classified data comprises image data, voice data and text data, and the corresponding categories comprise image categories, voice categories and text categories;
determining a search space of a model structure, initializing a model structure generator, and searching out a target model structure through multiple rounds of iterative operation;
the iterative operation includes:
searching a candidate model structure in the search space by using the model structure generator, training the candidate model structure and obtaining a classification threshold value of the trained candidate model structure under each preset recall rate;
Generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate, wherein the model structure generator is updated based on the feedback information before executing the next iteration operation;
in response to determining that the model structure generator reaches a preset convergence condition, determining a candidate model structure in the current iterative operation as a target model structure;
the method further comprises the steps of:
replacing the model structure to be replaced with the target model structure;
and classifying the data to be classified by using the target model structure.
2. The method of claim 1, wherein the iterative operation further comprises:
determining performance information of the trained candidate model structure;
generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate comprises the following steps:
and generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure.
3. The method of claim 2, wherein the generating feedback information from the trained difference between the classification threshold of the candidate model structure and the model structure to be replaced at the same preset recall, and the trained performance information of the candidate model structure comprises:
generating a first feedback value according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate;
generating a second feedback value according to the trained performance information of the candidate model structure;
and taking the weighted sum of the first feedback value and the second feedback value as the feedback information.
4. A method according to claim 3, wherein the first feedback value has a weight that is less than the weight of the second feedback value.
5. An apparatus for searching a model structure, wherein the model structure is used to perform classification tasks, comprising:
the obtaining unit is configured to obtain a classification threshold value of the model structure to be replaced under at least one preset recall rate, wherein the classification threshold value comprises: the to-be-replaced model structure maps the characteristics of to-be-classified data to threshold values adopted by corresponding categories, wherein the to-be-classified data comprises image data, voice data and text data, and the corresponding categories comprise image categories, voice categories and text categories;
The searching unit is configured to determine a searching space of the model structure, initialize a model structure generator and search out a target model structure through multiple iterative operations;
the search unit includes:
a computing unit configured to perform the following steps in the iterative operation: searching a candidate model structure in the search space by using the model structure generator, training the candidate model structure and obtaining a classification threshold value of the trained candidate model structure under each preset recall rate;
a generating unit configured to perform the following steps in the iterative operation: generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate, wherein the model structure generator is updated based on the feedback information before executing the next iteration operation;
a determining unit configured to perform the following steps in the iterative operation: in response to determining that the model structure generator reaches a preset convergence condition, determining a candidate model structure in the current iterative operation as a target model structure;
the apparatus further comprises:
A replacing unit configured to replace the model structure to be replaced with the target model structure;
and the processing unit is configured to classify the data to be classified by utilizing the target model structure.
6. The apparatus of claim 5, wherein the search unit further comprises:
a test unit configured to perform the following steps in the iterative operation: determining performance information of the trained candidate model structure;
the generation unit includes:
the information generation module is configured to generate feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure.
7. The apparatus of claim 6, wherein the information generation module comprises:
the first generation module is configured to generate a first feedback value according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate;
the second generation module is configured to generate a second feedback value according to the trained performance information of the candidate model structure;
And the fusion module is configured to take the weighted sum of the first feedback value and the second feedback value as the feedback information.
8. The apparatus of claim 7, wherein the first feedback value has a weight that is less than a weight of the second feedback value.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.
11. A computer system comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-4.
CN202010503202.3A 2020-06-05 2020-06-05 Method and apparatus for searching model structures Active CN111667056B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010503202.3A CN111667056B (en) 2020-06-05 2020-06-05 Method and apparatus for searching model structures

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010503202.3A CN111667056B (en) 2020-06-05 2020-06-05 Method and apparatus for searching model structures

Publications (2)

Publication Number Publication Date
CN111667056A CN111667056A (en) 2020-09-15
CN111667056B true CN111667056B (en) 2023-09-26

Family

ID=72386474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010503202.3A Active CN111667056B (en) 2020-06-05 2020-06-05 Method and apparatus for searching model structures

Country Status (1)

Country Link
CN (1) CN111667056B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836801A (en) * 2021-02-03 2021-05-25 上海商汤智能科技有限公司 Deep learning network determination method and device, electronic equipment and storage medium
CN112989361B (en) * 2021-04-14 2023-10-20 华南理工大学 Model security detection method based on generation countermeasure network
CN113076903A (en) * 2021-04-14 2021-07-06 上海云从企业发展有限公司 Target behavior detection method and system, computer equipment and machine readable medium
CN113806519A (en) * 2021-09-24 2021-12-17 金蝶软件(中国)有限公司 Search recall method, device and medium
CN116188834B (en) * 2022-12-08 2023-10-20 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463704A (en) * 2017-08-16 2017-12-12 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN108959552A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Recognition methods, device, equipment and the storage medium of question and answer class query statement
CN109816116A (en) * 2019-01-17 2019-05-28 腾讯科技(深圳)有限公司 The optimization method and device of hyper parameter in machine learning model
CN110543944A (en) * 2019-09-11 2019-12-06 北京百度网讯科技有限公司 neural network structure searching method, apparatus, electronic device, and medium
CN110674326A (en) * 2019-08-06 2020-01-10 厦门大学 Neural network structure retrieval method based on polynomial distribution learning
CN110766142A (en) * 2019-10-30 2020-02-07 北京百度网讯科技有限公司 Model generation method and device
CN110782015A (en) * 2019-10-25 2020-02-11 腾讯科技(深圳)有限公司 Training method and device for network structure optimizer of neural network and storage medium
CN110852321A (en) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 Candidate frame filtering method and device and electronic equipment
CN110909877A (en) * 2019-11-29 2020-03-24 百度在线网络技术(北京)有限公司 Neural network model structure searching method and device, electronic equipment and storage medium
CN110956260A (en) * 2018-09-27 2020-04-03 瑞士电信公司 System and method for neural architecture search
CN111160448A (en) * 2019-12-26 2020-05-15 北京达佳互联信息技术有限公司 Training method and device for image classification model
CN111178546A (en) * 2019-12-31 2020-05-19 华为技术有限公司 Searching method of machine learning model, and related device and equipment
WO2022063247A1 (en) * 2020-09-28 2022-03-31 华为技术有限公司 Neural architecture search method and apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190286984A1 (en) * 2018-03-13 2019-09-19 Google Llc Neural architecture search by proxy
US20200160177A1 (en) * 2018-11-16 2020-05-21 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463704A (en) * 2017-08-16 2017-12-12 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN108959552A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Recognition methods, device, equipment and the storage medium of question and answer class query statement
CN110956260A (en) * 2018-09-27 2020-04-03 瑞士电信公司 System and method for neural architecture search
CN109816116A (en) * 2019-01-17 2019-05-28 腾讯科技(深圳)有限公司 The optimization method and device of hyper parameter in machine learning model
CN110674326A (en) * 2019-08-06 2020-01-10 厦门大学 Neural network structure retrieval method based on polynomial distribution learning
CN110543944A (en) * 2019-09-11 2019-12-06 北京百度网讯科技有限公司 neural network structure searching method, apparatus, electronic device, and medium
CN110782015A (en) * 2019-10-25 2020-02-11 腾讯科技(深圳)有限公司 Training method and device for network structure optimizer of neural network and storage medium
CN110766142A (en) * 2019-10-30 2020-02-07 北京百度网讯科技有限公司 Model generation method and device
CN110852321A (en) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 Candidate frame filtering method and device and electronic equipment
CN110909877A (en) * 2019-11-29 2020-03-24 百度在线网络技术(北京)有限公司 Neural network model structure searching method and device, electronic equipment and storage medium
CN111160448A (en) * 2019-12-26 2020-05-15 北京达佳互联信息技术有限公司 Training method and device for image classification model
CN111178546A (en) * 2019-12-31 2020-05-19 华为技术有限公司 Searching method of machine learning model, and related device and equipment
WO2022063247A1 (en) * 2020-09-28 2022-03-31 华为技术有限公司 Neural architecture search method and apparatus

Also Published As

Publication number Publication date
CN111667056A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
CN111667056B (en) Method and apparatus for searching model structures
CN111639710B (en) Image recognition model training method, device, equipment and storage medium
CN111667054B (en) Method, device, electronic equipment and storage medium for generating neural network model
CN111582453B (en) Method and device for generating neural network model
CN111582479B (en) Distillation method and device for neural network model
CN111667057B (en) Method and apparatus for searching model structures
CN111708876B (en) Method and device for generating information
CN111967256B (en) Event relation generation method and device, electronic equipment and storage medium
CN111582454B (en) Method and device for generating neural network model
CN110795569B (en) Method, device and equipment for generating vector representation of knowledge graph
CN111737995A (en) Method, device, equipment and medium for training language model based on multiple word vectors
CN111461345B (en) Deep learning model training method and device
CN111563593B (en) Training method and device for neural network model
CN111680517B (en) Method, apparatus, device and storage medium for training model
CN111737954B (en) Text similarity determination method, device, equipment and medium
CN111859953B (en) Training data mining method and device, electronic equipment and storage medium
CN111241838B (en) Semantic relation processing method, device and equipment for text entity
CN111737996A (en) Method, device and equipment for obtaining word vector based on language model and storage medium
CN111639753B (en) Method, apparatus, device and storage medium for training image processing super network
CN112559870B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN111767833A (en) Model generation method and device, electronic equipment and storage medium
CN111695698A (en) Method, device, electronic equipment and readable storage medium for model distillation
CN114492788A (en) Method and device for training deep learning model, electronic equipment and storage medium
CN111967591A (en) Neural network automatic pruning method and device and electronic equipment
CN112580723B (en) Multi-model fusion method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant