CN111667056A - Method and apparatus for searching model structure - Google Patents

Method and apparatus for searching model structure Download PDF

Info

Publication number
CN111667056A
CN111667056A CN202010503202.3A CN202010503202A CN111667056A CN 111667056 A CN111667056 A CN 111667056A CN 202010503202 A CN202010503202 A CN 202010503202A CN 111667056 A CN111667056 A CN 111667056A
Authority
CN
China
Prior art keywords
model structure
classification threshold
candidate
trained
recall rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010503202.3A
Other languages
Chinese (zh)
Other versions
CN111667056B (en
Inventor
希滕
张刚
温圣召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010503202.3A priority Critical patent/CN111667056B/en
Publication of CN111667056A publication Critical patent/CN111667056A/en
Application granted granted Critical
Publication of CN111667056B publication Critical patent/CN111667056B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The application discloses a method and a device for searching a model structure, and relates to the technical field of artificial intelligence, deep learning and image processing. The method comprises the following steps: obtaining a classification threshold value of a model structure to be replaced under at least one preset recall rate; determining a search space of the model structure, initializing the model structure generator, and iterating the following steps: searching out a candidate model structure in a search space by using a model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate; generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and updating a model structure generator based on the feedback information before executing the next iteration; and stopping iteration when the model structure generator reaches a preset convergence condition, and determining the candidate model structure in the current iteration operation as the target model structure. The method can improve the accuracy of the search model structure.

Description

Method and apparatus for searching model structure
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence, deep learning and image processing, and particularly relates to a method and a device for searching a model structure.
Background
Deep learning techniques have enjoyed great success in many directions. In the deep learning technique, the quality of the model structure (i.e., the structure of the neural network) has a very important influence on the effect of the final model. However, manually designing a model structure requires a designer to have a very large experience and to search various combinations, and conventional random search is hardly feasible because numerous network parameters can generate explosive combinations. Therefore, in recent years, a Neural Network Architecture Search (NAS) technology has become a research focus, which replaces a tedious manual operation with an algorithm to automatically Search for an optimal model structure.
The problem of low accuracy of the model structure searched by the existing NAS-based automatic model structure searching method is solved.
Disclosure of Invention
A method, an apparatus, an electronic device, and a computer-readable storage medium for searching a model structure are provided.
According to a first aspect, there is provided a method for searching a model structure, the method comprising:
obtaining a classification threshold value of a model structure to be replaced under at least one preset recall rate, wherein the classification threshold value comprises: mapping the characteristics of the data to be classified to the threshold value adopted by the corresponding category by the model structure to be replaced;
determining a search space of a model structure, initializing a model structure generator, and searching out a target model structure through multiple rounds of iterative operations; the iterative operation comprises: searching out a candidate model structure in a search space by using a model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate; generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate, and updating the model structure generator based on the feedback information before executing the next iteration operation; in response to determining that the model structure generator reaches a preset convergence condition, determining the candidate model structure in the current iteration operation as the target model structure.
According to a second aspect, there is provided an apparatus for searching a model structure, the apparatus comprising:
an obtaining unit configured to obtain a classification threshold of the model structure to be replaced under at least one preset recall rate, where the classification threshold includes: mapping the characteristics of the data to be classified to the threshold value adopted by the corresponding category by the model structure to be replaced; the searching unit is configured to determine a searching space of the model structure, initialize the model structure generator and search out a target model structure through multiple rounds of iterative operations; the search unit includes: a computing unit configured to perform the following steps in an iterative operation: searching out a candidate model structure in a search space by using a model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate; a generating unit configured to perform the following steps in an iterative operation: generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate, and updating the model structure generator based on the feedback information before executing the next iteration operation; a determination unit configured to perform the following steps in an iterative operation: in response to determining that the model structure generator reaches a preset convergence condition, determining the candidate model structure in the current iteration operation as the target model structure.
According to a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors: a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the method for searching a model structure as provided in the first aspect.
According to a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method for searching a model structure provided by the first aspect.
According to the method and the device for searching the model structure, the feedback information is generated according to the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall rate, and the parameters of the model structure generator are updated in an iterative mode according to the feedback information, so that the performance of the finally searched target model structure is in line with the expectation, and the accuracy of searching the model structure is improved.
The method and the device solve the problem that the accuracy of the model structure searched by the automatic model structure searching method is low.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which embodiments of the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for searching a model structure according to the present application;
FIG. 3 is a flow diagram of another embodiment of a method for searching a model structure according to the present application;
FIG. 4 is a schematic diagram illustrating one embodiment of an apparatus for searching model structures according to the present application;
FIG. 5 is a block diagram of an electronic device for implementing a method for searching a model structure according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present methods for searching a model structure or apparatuses for searching a model structure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various client applications installed thereon, such as an image classification application, an information classification application, a search-class application, a shopping-class application, a financial-class application, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting receiving of server messages, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving picture Experts Group Audio Layer III, motion picture Experts compression standard Audio Layer 3), MP4 players (Moving picture Experts Group Audio Layer IV, motion picture Experts compression standard Audio Layer 4), laptop and desktop computers, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing background services for applications running on the terminal devices 101, 102, 103, or may be a server providing support for a neural network model running on the terminal devices 101, 102, 103. The server 105 may obtain data to be processed from the terminal devices 101, 102, 103, process the data to be processed using the neural network model, and return the processing result to the terminal devices 101, 102, 103. The server 105 may also train a neural network model that performs various deep learning tasks (such as image processing, speech recognition, text translation, and the like) using media data such as image data, speech data, text data, and the like acquired from the terminal apparatuses 101, 102, 103 or a database, and transmit the trained neural network model to the terminal apparatuses 101, 102, 103. Alternatively, the server 105 may automatically search out a well-performing neural network model structure based on the deep learning task to be performed, and train the searched out neural network model structure based on the media data.
It should be noted that the method for searching the model structure provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the apparatus for searching the model structure is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for searching a model structure according to the present disclosure is shown. Method for searching a model structure, comprising the steps of:
step 201, obtaining a classification threshold value of a model structure to be replaced under at least one preset recall rate.
Wherein the classification threshold comprises: and the model structure to be replaced maps the characteristics of the data to be classified to the threshold value adopted by the corresponding category.
In this embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for searching a model structure may first obtain a model structure to be replaced, where the model structure to be replaced may be a model structure for implementing functions of target classification, target identification, target verification, and the like. In practice, the model structure to be replaced may be a neural network model running on-line, for example, a neural network model supporting a specific function in various terminal applications, such as an image classification model, a speech recognition model, and the like.
The executing body may further obtain a classification threshold of the model structure to be replaced under at least one preset Recall Ratio, where the Recall Ratio (Recall Ratio) is a Ratio of the related information quantity detected from the database to all related information quantities, and is an index for measuring the success degree of a certain retrieval system for detecting related data from the data set, and in a classification system including positive classes and negative classes, the Recall Ratio is a Ratio of all samples in the positive classes that are correctly identified as the positive classes; the classification threshold is a classification threshold used when the model structure to be replaced maps the features of the data to be classified to the corresponding classes for data classification. The classification threshold may be set in advance for determining a class to which the object to be classified belongs based on features extracted from the object. And in the operation process of the model structure to be replaced, classifying objects such as images, voice, texts and the like to be classified. The classification threshold may specifically be a probability threshold that the target to be classified is determined as a certain class according to the extracted features, and the classification threshold has an association relationship with the recall rate.
In this embodiment, the server may obtain the classification threshold values of the model structure to be replaced at the multiple different recall rates, so as to optimize the network parameters of the model structure generator (i.e., the neural network for searching the model structure) according to the multiple classification threshold values of the model structure to be replaced at the different recall rates, thereby improving the search accuracy of the neural network.
Step 202, determining a search space of the model structure, initializing the model structure generator, and searching out the target model structure through multiple rounds of iterative operations.
In this embodiment, a search space of model structures that can be used for performing the classification task is determined, the search space of model structures comprising a plurality of candidate model structures, which may consist of basic building units of the model structures. The model structure generator may sample basic building units from a search space of the model structure, and form a complete candidate model structure by stacking and connecting the basic building units. The model structure generator is used for generating a model structure based on the search space and the external feedback information, which may be implemented as a recurrent neural network, a convolutional neural network, a reinforcement learning algorithm, an evolutionary algorithm, a simulated annealing algorithm, and the like. The network parameters of the model structure generator, which refer to the connection weights between neurons in different levels in the neural network, or the structure sampling strategy, which refers to a method for sampling the model structure or basic building units from the above search space, may be initialized. And then, starting to perform multiple rounds of iterative operations until the iterative operations are finished, and searching out the target model structure.
The iterative operation includes steps 2021, 2022:
step 2021, searching out a candidate model structure in the search space by using the model structure generator, training the candidate model structure, and obtaining a classification threshold of the trained candidate model structure at each preset recall rate.
In each iteration, the candidate model structure may be searched in a search space using the current model structure generator. In particular, a model structure encoding rule may be predefined and the model structure encoder may generate a sequence characterizing the candidate model structures. For example, when the model structure generator is implemented as a recurrent neural network, the search space encoding may be used as input data of the recurrent neural network, and the candidate model structure may be obtained by decoding a sequence output from the recurrent neural network according to the above-described model structure encoding rule. Alternatively, when the model structure generator is implemented as an enhancement algorithm, it may generate a state sequence, which is decoded according to the model structure rules described above to obtain candidate model structures.
The searched candidate model structures may then be trained using the training data until the candidate model structures converge. And then, a classification threshold value of the converged candidate model structure under a preset recall ratio can be obtained, wherein the recall ratio and the recall ratio used for calculating the classification threshold value of the model structure to be replaced under the preset recall ratio are the same numerical value.
The training data may be a sample data set acquired by the server through the terminal device, a training data set acquired by the server reading a local storage or a knowledge base, or a training data set acquired through the internet or the like. In this embodiment, the training data may be training data of a classification task, such as image data in a target recognition or identity authentication task. The basis for judging whether the candidate model structure converges may be: and judging whether the preset performance convergence index reaches a preset convergence threshold value, for example, the preset classification accuracy is the convergence index, and 90% is the convergence threshold value, and judging that the candidate model structure is converged when the classification accuracy of the candidate model structure reaches 90%.
In this embodiment, it may also be determined whether training of the candidate model is finished according to the training time, that is, the model structure generator is used to search the candidate model structure in the search space, and the training data is used to train the candidate model structure until the training time reaches the preset training time, so as to obtain the classification threshold of the trained candidate model structure at the preset recall rate.
Step 2022, generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and updating the model structure generator based on the feedback information before executing the next iteration operation.
In this embodiment, feedback information is generated according to a difference between classification thresholds of the trained candidate model structure and the model structure to be replaced under the same preset recall rate, and parameters of the model structure generator are updated in the next iteration operation based on the feedback information. The difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate may be a mathematical calculation result between the two classification threshold values, for example, a difference between the two classification threshold values, or a quotient between the two classification threshold values; a difference threshold value can also be preset, and when the mathematical calculation result between the two classification threshold values is smaller than the difference threshold value, binary symbols are used for representing whether the two classification threshold values are different or not so as to represent the difference condition between the two classification threshold values; other machine-readable expressions that can characterize the difference are also possible. The model structure generator may be updated based on the feedback information. In particular, the model structure generator may update parameters or update the structure sampling strategy based on the feedback information. For example, when the model structure generator is a recurrent neural network, the parameters of the model structure generator can be updated by a gradient descent method based on the feedback information for back propagation; when the model structure generator is a reinforcement learning algorithm, this feedback information is treated as a feedback value (reward), and the model structure generator regenerates the state sequence based on the feedback value.
Step 2023, in response to determining that the model structure generator reaches the preset convergence condition, determining the candidate model structure in the current iteration operation as the target model structure.
In this embodiment, when the model structure generator reaches a preset convergence condition, the iteration operation is ended, and the candidate model structure searched in the last iteration operation is determined as the target model structure. The condition that the model structure generator reaches the preset convergence condition may be that the iteration number of the model structure generator reaches the preset iteration number, that the iteration time of the model structure generator reaches the preset iteration time, or that the performance of the candidate model structure searched by the model structure generator reaches the expected performance.
The method for searching the model structure provided by this embodiment generates feedback information according to the difference between the classification threshold values of the candidate model structure and the model structure to be replaced at the same recall rate, and iteratively updates the network parameters of the model structure generator according to the feedback information, so that the difference between the extracted features of the model structure to be replaced and the candidate model structure on the classification object can be fed back to the model structure generator, the model structure generator generates the candidate model structure consistent with the extracted features of the model structure to be replaced, and the threshold value of the finally searched target model structure at the preset recall rate is close to the threshold value of the model structure to be replaced at the same recall rate. Therefore, when the searched target model structure is used for replacing the model structure to be replaced, the classification threshold value does not need to be redesigned or changed, and the complexity of replacing the model structure on the line is reduced. And the feedback information is generated through the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall rate, and the network parameters of the model structure generator are updated iteratively according to the feedback information without retraining the model structure to be replaced, so that the hardware operation resources and the memory resources consumed by training can be reduced, and the hardware equipment resources are saved.
The method of the embodiment can determine the target model structure for constructing the neural network model for executing the image processing task, and a large amount of matrix operation is involved because the image data is generally converted into matrix data in the processing of the neural network model. In the embodiment, feedback information is generated by using the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall rate, and the network parameters of the model structure generator are updated iteratively according to the feedback information, so that the time cost for retraining the target model structure and the consumption of hardware resources can be saved, and the efficiency for constructing the neural network model for executing the image processing task is improved.
With further reference to FIG. 3, a flow 300 of yet another embodiment of a method for searching a model structure is shown. The process 300 of the method for searching a model structure includes the steps of:
step 301, obtaining a classification threshold of a model structure to be replaced under at least one preset recall rate, wherein the classification threshold includes: and the model structure to be replaced maps the characteristics of the data to be classified to the threshold value adopted by the corresponding category.
Step 302, determining a search space of the model structure, initializing the model structure generator, and searching out the target model structure through multiple rounds of iterative operations.
The iterative operation comprises steps 3021 and 3022:
step 3021, searching a candidate model structure in the search space by using the model structure generator, training the candidate model structure, and obtaining a classification threshold of the trained candidate model structure at each preset recall rate.
Step 3022, determining performance information of the trained candidate model structure, and generating feedback information according to a difference between the trained candidate model structure and the classification threshold of the model structure to be replaced at the same preset recall rate, and the performance information of the trained candidate model structure, wherein the model structure generator is updated based on the feedback information before executing the next iteration operation.
In this embodiment, first, performance information of the trained candidate model structure and a difference between classification thresholds of the trained candidate model structure and the model structure to be replaced at the same preset recall rate are obtained, where the difference between the classification thresholds of the trained candidate model structure and the model structure to be replaced at the same preset recall rate may be a difference between the two classification thresholds. And then, generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure, and updating the network parameters of the model structure generator in the next iteration operation according to the feedback information. The feedback information is a performance index used for adjusting network parameters in the neural network, and the performance of the neural network after iteration is finished can meet expectations by iteratively adjusting the network parameters of the neural network.
According to the method and the device, the feedback information is generated according to the two factors of the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure to be used for adjusting the parameters of the neural network, so that the searched target model structure can replace the model structure to be replaced to execute the classification task without modifying the classification strategy, and meanwhile, the performance of the searched target model structure can be in accordance with the expectation, and the searching accuracy is improved.
In some optional implementations of this embodiment, the feedback information may also be generated as follows: generating a first feedback value according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate; generating a second feedback value according to the performance information of the trained candidate model structure; and taking the weighted sum of the first feedback value and the second feedback value as the feedback information.
In this embodiment, a first feedback value is first generated according to a difference between classification thresholds of a trained candidate model structure and a model structure to be replaced under the same preset recall rate, a second feedback value is generated according to performance information of the trained candidate model structure, and then a weighted sum of the first feedback value and the second feedback value is used as feedback information. For example, if the first feedback value is 4, the corresponding feedback weight is 0.9, the second feedback value is 12, and the corresponding feedback weight is 0.1, then 4.8, which is the weighted sum of the first feedback value and the second feedback value, is used as the feedback information. In the embodiment, the influence of the first feedback value and the second feedback value on the feedback information is adjusted by introducing the weight, so that the final target model structure is more focused on the performance or the strategy similarity with the model structure to be replaced, the search result is more in line with the user requirement, and the search accuracy is further improved.
Optionally, in this embodiment, the weight of the first feedback value is smaller than the weight of the second feedback value. When the weight of the first feedback value is smaller than that of the second feedback value, the search strategy of the model structure generator is more focused on searching out a model structure with good performance, and the performance of the finally searched target model structure can be improved.
Step 3023, in response to determining that the model structure generator reaches the preset convergence condition, determining the candidate model structure in the current iteration operation as the target model structure.
Step 301, step 302, step 3021, and step 3023 in this embodiment are respectively the same as step 201, step 2021, step 203, and step 2023 in the foregoing embodiment, and specific implementations of step 301, step 302, step 303, and step 305 may refer to descriptions of corresponding steps in the foregoing embodiment, which are not repeated herein.
In some optional implementations of the embodiments described above in connection with fig. 2 and 3, the method for searching for a model structure further comprises: replacing the model structure to be replaced with a target model structure; and classifying the data to be classified by utilizing the target model structure.
According to the method for searching the model structure, the feedback information is generated according to the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall, and the parameters of the model structure generator are updated iteratively according to the feedback information, so that the performance of the finally searched target model structure is in line with the expectation, and the accuracy of searching the model structure is improved.
In some optional implementations of the foregoing embodiments, the foregoing iterative operation may further include: in response to determining that the model structure generator does not meet a preset convergence condition, performing a next iteration operation based on the updated model structure generator. Therefore, the model structure generator is gradually optimized through repeated execution of iterative operation, so that the model structure generator generates a target model structure with smaller characteristic loss with the model structure to be replaced, or searches out the target model structure with smaller characteristic loss with the model structure to be replaced and better performance, and the automatic optimization of the model structure is realized.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for searching a model structure, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the apparatus 400 for searching a model structure of the present embodiment includes: acquisition unit 401, search unit 402, calculation unit 4021, generation unit 4022, and determination unit 4023. The obtaining unit 401 is configured to obtain a classification threshold of the model structure to be replaced under at least one preset recall rate, where the classification threshold includes: mapping the characteristics of the data to be classified to the threshold value adopted by the corresponding category by the model structure to be replaced; a searching unit 402 configured to determine a search space of the model structure, initialize the model structure generator, and search out a target model structure through multiple rounds of iterative operations; the search unit 402 includes: a calculation unit 4021 configured to perform the following steps in the iterative operation: searching out a candidate model structure in a search space by using a model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate; the search unit 402 includes: a generating unit 4022 configured to perform the following steps in the iterative operation: generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate, and updating the model structure generator based on the feedback information before executing the next iteration operation; a determining unit 4023 configured to perform the following steps in the iterative operation: in response to determining that the model structure generator reaches a preset convergence condition, determining the candidate model structure in the current iteration operation as the target model structure.
The device for searching the model structure provided by the embodiment generates the feedback information according to the difference of the classification threshold values of the candidate model structure and the model structure to be replaced under the same recall, and iteratively updates the parameters of the model structure generator according to the feedback information, so that the performance of the finally searched target model structure is in line with the expectation, and the searching accuracy of the model structure is improved.
In some embodiments, the search unit 402 further comprises: a test unit configured to perform the following steps in an iterative operation: determining performance information of the trained candidate model structure; the generating unit 4022 includes: and the information generation module is configured to generate feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure.
In some embodiments, the information generation module comprises: the first generation module is configured to generate a first feedback value according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate; a second generating module configured to generate a second feedback value according to the trained performance information of the candidate model structure; a fusion module configured to use a weighted sum of the first feedback value and the second feedback value as the feedback information.
In some embodiments, the weight of the first feedback value is less than the weight of the second feedback value.
In some embodiments, the means for searching the model structure further comprises: a replacement unit configured to replace the model structure to be replaced with the target model structure; a processing unit configured to classify the data to be classified using the target model structure.
The units in the apparatus 400 described above correspond to the steps in the method described with reference to fig. 2 and 4. Thus, the operations, features and technical effects that can be achieved by the above-described method for searching a model structure are also applicable to the apparatus 400 and the units included therein, and are not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, is a block diagram of an electronic device for a method of searching for a model structure according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.
Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for searching a model structure provided herein. A non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for searching a model structure provided herein.
The memory 502, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for searching for a model structure in the embodiment of the present application (for example, the acquisition unit 401, the search unit 402, the calculation unit 4021, the generation unit 4022, and the determination unit 4023 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e., implements the method for searching a model structure in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 502.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device for searching for the model structure, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to an electronic device for searching model structures via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method for searching a model structure may further include: an input device 503, an output device 504, and a bus 505. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus 505 or other means, and fig. 5 illustrates an example in which these are connected by the bus 505.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus for searching for a model structure, such as an input device like a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A method for searching a model structure, comprising:
obtaining a classification threshold value of a model structure to be replaced under at least one preset recall rate, wherein the classification threshold value comprises: the model structure to be replaced maps the characteristics of the data to be classified to the threshold value adopted by the corresponding category;
determining a search space of a model structure, initializing a model structure generator, and searching out a target model structure through multiple rounds of iterative operations;
the iterative operation comprises:
searching out a candidate model structure in the search space by using the model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate;
generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate, wherein the model structure generator is updated based on the feedback information before executing the next iteration operation;
in response to determining that the model structure generator reaches a preset convergence condition, determining the candidate model structure in the current iteration operation as a target model structure.
2. The method of claim 1, wherein the iterative operations further comprise:
determining performance information of the trained candidate model structure;
the generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate includes:
and generating feedback information according to the difference between the classification threshold values of the trained candidate model structure and the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure.
3. The method of claim 2, wherein the generating feedback information according to the difference between the trained candidate model structure and the model structure to be replaced under the same preset recall ratio and the performance information of the trained candidate model structure comprises:
generating a first feedback value according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate;
generating a second feedback value according to the trained performance information of the candidate model structure;
and taking the weighted sum of the first feedback value and the second feedback value as the feedback information.
4. The method of claim 3, wherein the first feedback value is weighted less than the second feedback value.
5. The method of any of claims 1-4, wherein the method further comprises:
replacing the model structure to be replaced with the target model structure;
and classifying the data to be classified by utilizing the target model structure.
6. An apparatus for searching a model structure, comprising:
an obtaining unit configured to obtain a classification threshold of a model structure to be replaced under at least one preset recall rate, wherein the classification threshold includes: the model structure to be replaced maps the characteristics of the data to be classified to the threshold value adopted by the corresponding category;
the searching unit is configured to determine a searching space of the model structure, initialize the model structure generator and search out a target model structure through multiple rounds of iterative operations;
the search unit includes:
a computing unit configured to perform the following steps in the iterative operation: searching out a candidate model structure in the search space by using the model structure generator, training the candidate model structure and acquiring a classification threshold value of the trained candidate model structure under each preset recall rate;
a generating unit configured to perform the following steps in the iterative operation: generating feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate, wherein the model structure generator is updated based on the feedback information before executing the next iteration operation;
a determining unit configured to perform the following steps in the iterative operation: in response to determining that the model structure generator reaches a preset convergence condition, determining the candidate model structure in the current iteration operation as a target model structure.
7. The apparatus of claim 6, wherein the search unit further comprises:
a test unit configured to perform the following steps in the iterative operation: determining performance information of the trained candidate model structure;
the generation unit includes:
and the information generation module is configured to generate feedback information according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate and the performance information of the trained candidate model structure.
8. The apparatus of claim 7, wherein the information generating module comprises:
the first generation module is configured to generate a first feedback value according to the difference between the trained candidate model structure and the classification threshold value of the model structure to be replaced under the same preset recall rate;
a second generating module configured to generate a second feedback value according to the trained performance information of the candidate model structure;
a fusion module configured to take a weighted sum of the first feedback value and the second feedback value as the feedback information.
9. The apparatus of claim 8, wherein the first feedback value is weighted less than the second feedback value.
10. The apparatus of any of claims 6-9, wherein the apparatus further comprises:
a replacement unit configured to replace the model structure to be replaced with the target model structure;
a processing unit configured to classify data to be classified using the target model structure.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202010503202.3A 2020-06-05 2020-06-05 Method and apparatus for searching model structures Active CN111667056B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010503202.3A CN111667056B (en) 2020-06-05 2020-06-05 Method and apparatus for searching model structures

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010503202.3A CN111667056B (en) 2020-06-05 2020-06-05 Method and apparatus for searching model structures

Publications (2)

Publication Number Publication Date
CN111667056A true CN111667056A (en) 2020-09-15
CN111667056B CN111667056B (en) 2023-09-26

Family

ID=72386474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010503202.3A Active CN111667056B (en) 2020-06-05 2020-06-05 Method and apparatus for searching model structures

Country Status (1)

Country Link
CN (1) CN111667056B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989361A (en) * 2021-04-14 2021-06-18 华南理工大学 Model security detection method based on generation countermeasure network
CN113076903A (en) * 2021-04-14 2021-07-06 上海云从企业发展有限公司 Target behavior detection method and system, computer equipment and machine readable medium
CN113806519A (en) * 2021-09-24 2021-12-17 金蝶软件(中国)有限公司 Search recall method, device and medium
WO2022166069A1 (en) * 2021-02-03 2022-08-11 上海商汤智能科技有限公司 Deep learning network determination method and apparatus, and electronic device and storage medium
CN116188834A (en) * 2022-12-08 2023-05-30 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463704A (en) * 2017-08-16 2017-12-12 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN108959552A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Recognition methods, device, equipment and the storage medium of question and answer class query statement
CN109816116A (en) * 2019-01-17 2019-05-28 腾讯科技(深圳)有限公司 The optimization method and device of hyper parameter in machine learning model
US20190286984A1 (en) * 2018-03-13 2019-09-19 Google Llc Neural architecture search by proxy
CN110543944A (en) * 2019-09-11 2019-12-06 北京百度网讯科技有限公司 neural network structure searching method, apparatus, electronic device, and medium
CN110674326A (en) * 2019-08-06 2020-01-10 厦门大学 Neural network structure retrieval method based on polynomial distribution learning
CN110766142A (en) * 2019-10-30 2020-02-07 北京百度网讯科技有限公司 Model generation method and device
CN110782015A (en) * 2019-10-25 2020-02-11 腾讯科技(深圳)有限公司 Training method and device for network structure optimizer of neural network and storage medium
CN110852321A (en) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 Candidate frame filtering method and device and electronic equipment
CN110909877A (en) * 2019-11-29 2020-03-24 百度在线网络技术(北京)有限公司 Neural network model structure searching method and device, electronic equipment and storage medium
CN110956260A (en) * 2018-09-27 2020-04-03 瑞士电信公司 System and method for neural architecture search
CN111160448A (en) * 2019-12-26 2020-05-15 北京达佳互联信息技术有限公司 Training method and device for image classification model
CN111178546A (en) * 2019-12-31 2020-05-19 华为技术有限公司 Searching method of machine learning model, and related device and equipment
US20200160177A1 (en) * 2018-11-16 2020-05-21 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations
WO2022063247A1 (en) * 2020-09-28 2022-03-31 华为技术有限公司 Neural architecture search method and apparatus

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463704A (en) * 2017-08-16 2017-12-12 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
US20190286984A1 (en) * 2018-03-13 2019-09-19 Google Llc Neural architecture search by proxy
CN108959552A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Recognition methods, device, equipment and the storage medium of question and answer class query statement
CN110956260A (en) * 2018-09-27 2020-04-03 瑞士电信公司 System and method for neural architecture search
US20200160177A1 (en) * 2018-11-16 2020-05-21 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations
CN109816116A (en) * 2019-01-17 2019-05-28 腾讯科技(深圳)有限公司 The optimization method and device of hyper parameter in machine learning model
CN110674326A (en) * 2019-08-06 2020-01-10 厦门大学 Neural network structure retrieval method based on polynomial distribution learning
CN110543944A (en) * 2019-09-11 2019-12-06 北京百度网讯科技有限公司 neural network structure searching method, apparatus, electronic device, and medium
CN110782015A (en) * 2019-10-25 2020-02-11 腾讯科技(深圳)有限公司 Training method and device for network structure optimizer of neural network and storage medium
CN110766142A (en) * 2019-10-30 2020-02-07 北京百度网讯科技有限公司 Model generation method and device
CN110852321A (en) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 Candidate frame filtering method and device and electronic equipment
CN110909877A (en) * 2019-11-29 2020-03-24 百度在线网络技术(北京)有限公司 Neural network model structure searching method and device, electronic equipment and storage medium
CN111160448A (en) * 2019-12-26 2020-05-15 北京达佳互联信息技术有限公司 Training method and device for image classification model
CN111178546A (en) * 2019-12-31 2020-05-19 华为技术有限公司 Searching method of machine learning model, and related device and equipment
WO2022063247A1 (en) * 2020-09-28 2022-03-31 华为技术有限公司 Neural architecture search method and apparatus

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022166069A1 (en) * 2021-02-03 2022-08-11 上海商汤智能科技有限公司 Deep learning network determination method and apparatus, and electronic device and storage medium
CN112989361A (en) * 2021-04-14 2021-06-18 华南理工大学 Model security detection method based on generation countermeasure network
CN113076903A (en) * 2021-04-14 2021-07-06 上海云从企业发展有限公司 Target behavior detection method and system, computer equipment and machine readable medium
CN112989361B (en) * 2021-04-14 2023-10-20 华南理工大学 Model security detection method based on generation countermeasure network
CN113806519A (en) * 2021-09-24 2021-12-17 金蝶软件(中国)有限公司 Search recall method, device and medium
CN116188834A (en) * 2022-12-08 2023-05-30 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model
CN116188834B (en) * 2022-12-08 2023-10-20 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model

Also Published As

Publication number Publication date
CN111667056B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN112560912B (en) Classification model training method and device, electronic equipment and storage medium
CN111667056B (en) Method and apparatus for searching model structures
CN111639710A (en) Image recognition model training method, device, equipment and storage medium
CN111667054A (en) Method and device for generating neural network model, electronic equipment and storage medium
CN111737994A (en) Method, device and equipment for obtaining word vector based on language model and storage medium
CN111582479B (en) Distillation method and device for neural network model
CN111708876B (en) Method and device for generating information
CN111667057B (en) Method and apparatus for searching model structures
CN111461345B (en) Deep learning model training method and device
CN111737995A (en) Method, device, equipment and medium for training language model based on multiple word vectors
CN112016633A (en) Model training method and device, electronic equipment and storage medium
CN111737996A (en) Method, device and equipment for obtaining word vector based on language model and storage medium
CN111753914A (en) Model optimization method and device, electronic equipment and storage medium
CN111859953B (en) Training data mining method and device, electronic equipment and storage medium
CN111339759A (en) Method and device for training field element recognition model and electronic equipment
CN110675954A (en) Information processing method and device, electronic equipment and storage medium
CN111680517A (en) Method, apparatus, device and storage medium for training a model
CN111539209A (en) Method and apparatus for entity classification
CN111639753A (en) Method, apparatus, device and storage medium for training a hyper-network
CN112559870A (en) Multi-model fusion method and device, electronic equipment and storage medium
CN111914994A (en) Method and device for generating multilayer perceptron, electronic equipment and storage medium
CN114492788A (en) Method and device for training deep learning model, electronic equipment and storage medium
CN111967591A (en) Neural network automatic pruning method and device and electronic equipment
CN112580723B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN114547244A (en) Method and apparatus for determining information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant