CN114037061A - Pre-training model generation method and device, electronic equipment and storage medium - Google Patents

Pre-training model generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114037061A
CN114037061A CN202111310486.5A CN202111310486A CN114037061A CN 114037061 A CN114037061 A CN 114037061A CN 202111310486 A CN202111310486 A CN 202111310486A CN 114037061 A CN114037061 A CN 114037061A
Authority
CN
China
Prior art keywords
model structure
pruning
model
search space
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111310486.5A
Other languages
Chinese (zh)
Inventor
希滕
张刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111310486.5A priority Critical patent/CN114037061A/en
Publication of CN114037061A publication Critical patent/CN114037061A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Neurology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a generation method and device of a pre-training model, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to computer vision and deep learning technology. The specific implementation scheme is as follows: determining a pruning search space according to a pre-training model; determining a candidate model structure set from the pruning search space, wherein the candidate model structure set comprises a plurality of candidate model structures; and under the condition that the target model structure does not exist in the candidate model structure set, training the candidate model structure set until a preset condition is met to obtain a target pre-training model.

Description

Pre-training model generation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technology, and more particularly, to computer vision and deep learning techniques. In particular, the invention relates to a generation method and device of a pre-training model, an electronic device and a storage medium.
Background
The pre-training model may refer to a model that is not related to a task and is obtained by training a preset model using a large number of training data. For the downstream task, the pre-training model can be fine-tuned by using a small amount of training data related to the downstream task to obtain a model for processing the downstream task. For example, the downstream tasks may include image processing tasks, audio processing tasks, or text processing tasks, among others.
Disclosure of Invention
The disclosure provides a generation method and device of a pre-training model, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a method for generating a pre-training model, including: determining a pruning search space according to a pre-training model; determining a candidate model structure set from the pruning search space, wherein the candidate model structure set comprises a plurality of candidate model structures; and under the condition that the target model structure does not exist in the candidate model structure set, training the candidate model structure set until a preset condition is met to obtain a target pre-training model.
According to another aspect of the present disclosure, there is provided a generation apparatus of a pre-training model, including: the first determining module is used for determining a pruning search space according to the pre-training model; a second determining module, configured to determine a candidate model structure set from the pruning search space, where the candidate model structure set includes a plurality of candidate model structures; and the first obtaining module is used for training the candidate model structure set under the condition that the target model structure does not exist in the candidate model structure set until a preset condition is met, so as to obtain a target pre-training model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which the generation method and apparatus of a pre-trained model may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of generating a pre-trained model according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of a generation process of a pre-trained model according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates an example schematic diagram of a generation process of a pre-trained model according to an embodiment of the disclosure;
FIG. 5 schematically shows a block diagram of an apparatus for generating a pre-trained model according to an embodiment of the present disclosure; and
FIG. 6 schematically illustrates a block diagram of an electronic device adapted to implement a method of generating a pre-trained model according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The pre-training model may be generated in two ways.
One approach is to generate a pre-trained model using an artificially designed based model structure. Namely, a pre-training model is obtained through a model structure designed manually. For example, the artificially designed model structure may include a ResNet (Residual Neural Network) based model structure or a transform-based model structure.
Another approach is to generate a pre-trained model using a model structure based on an automatic deep learning (i.e., AutoDL) search. That is, the AutoDL-based model structure can be obtained using the ImageNet dataset based on an automatic search method. And generating a pre-training model by using the model structure based on the AutoDL.
For the above one mode, the prediction accuracy of the pre-training model generated by using the model structure based on the artificial design is not high. For the other mode, the data distribution situation between the ImgaeNet data set and the training set used by the actual data processing task is different, so that the prediction accuracy of the pre-training model generated by using the model structure based on the AutoDL is not high.
Therefore, the embodiment of the disclosure provides a generation scheme of a pre-training model, that is, a candidate model structure set is determined from a pruning search space determined according to the pre-training model, and under the condition that a target model structure does not exist in the candidate model structure set, the candidate model structure set is trained until a predetermined condition is met, so as to obtain the target pre-training model, thereby realizing automatic pruning to obtain the pre-training model meeting a performance evaluation condition, and improving the prediction accuracy of the target pre-training model. Therefore, the pre-training model with the smaller scale (namely the target pre-training model) can achieve the same prediction precision as the pre-training model with the larger scale, and the training speed of the pre-training model with the smaller scale is higher. On this basis, if the target pre-training model is applied to a chip to perform a data processing task, the core competitiveness of the related product can be improved.
Fig. 1 schematically illustrates an exemplary system architecture to which the generation method and apparatus of a pre-trained model may be applied, according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the method and apparatus for generating a pre-training model may be applied may include a terminal device, but the terminal device may implement the method and apparatus for generating a pre-training model provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
The Server 105 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, and solves the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS (Virtual Private Server, VPS). Server 105 may also be a server of a distributed system or a server that incorporates a blockchain.
It should be noted that the generation method of the pre-training model provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the generating device of the pre-training model provided by the embodiment of the present disclosure may be generally disposed in the server 105. The generation method of the pre-training model provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the content processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
Alternatively, the generation method of the pre-training model provided by the embodiment of the present disclosure may also be generally executed by the terminal device 101, 102, or 103. Correspondingly, the generating device of the pre-training model provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2 schematically shows a flow chart of a method of generating a pre-trained model according to an embodiment of the present disclosure.
As shown in FIG. 2, the method 200 includes operations S210-S230.
In operation S210, a pruning search space is determined according to the pre-training model.
In operation S220, a set of candidate model structures is determined from the pruning search space. The set of candidate model structures includes a plurality of candidate model structures.
In operation S230, under the condition that it is determined that the target model structure does not exist in the candidate model structure set, the candidate model structure set is trained until a predetermined condition is satisfied, so as to obtain a target pre-training model.
According to embodiments of the present disclosure, the initial pruning search space may include one or more pruning search spaces. The initial pruning search space may be a full amount of pruning search space. The initial pruning search space may include at least one type of pruning search space. For example, the initial pruning search space may include at least one of a ResNet (Residual Neural Network) based pruning search space, a MobileNet based pruning search space, a transform-based pruning search space, a heterogeneous pruning search space, and the like. A heterogeneous pruning search space may refer to a pruning search space that includes different types of pruning search spaces.
According to embodiments of the present disclosure, the initial pruning search space may include a plurality of model structures. The model structure may be a model structure for performing a data processing task (i.e., a downstream task). The data processing task may include at least one of an image processing task, an audio processing task, a text processing task, and the like. Each model structure may comprise at least one model substructure and a connection relationship between different model substructures. Each model structure may be a structure obtained by connecting at least one model substructure based on a connection relationship between different model substructures. Each model structure may include at least one model substructure that is a structure from at least one operation layer, that is, each model structure may be a structure obtained by connecting at least one model substructure from at least one operation layer based on a connection relationship between different model substructures. For example, the at least one operational layer may include at least one of an input layer, a convolutional layer, a pooling layer, a fully-connected layer, a batch normalization layer, a non-linear layer, and the like. The at least one model substructure may include at least one of a convolution structure (i.e., convolution kernel), a pooling structure (i.e., pooling kernel), a fully-connected structure, a normalized structure, and the like. The hyper-parameters of different model substructures are the same or different. The hyper-parameters of the model substructure may comprise at least one of a size of the model substructure, a number and a step size of the model substructure, etc. For example, the hyper-parameters of the convolution structure may include the size of the convolution structure, the number of convolution structures, and the convolution step size. The connection relation may include at least one of addition, channel combination, and the like.
According to an embodiment of the present disclosure, the initial pruning search space may be generated according to a generation strategy of the pruning search space. The generation policy may be determined based on the generation requirements. For example, the number of expected model substructures, the type of model substructures, and the connection relationships between the model substructures may be determined according to the generation requirements. At least one model substructure is determined based on the number of model substructures and the type of model substructure. And associating at least one model substructure based on the connection relationship among the model substructures to obtain at least one model structure. And obtaining an initial pruning search space according to at least one model structure.
According to the embodiment of the disclosure, multiple rounds of training can be performed by using a plurality of model structures determined from the pruning search space until a predetermined condition is satisfied, so as to obtain a target pre-training model. Each round has a pruning search space corresponding to the round. The pruning search space corresponding to each round may be determined according to the pre-trained model corresponding to that round. The pre-training model corresponding to each round may refer to a model obtained by training a model structure included in the pruning search space corresponding to the round. The pruning search space includes a plurality of model structures, and the target model structure is a model structure satisfying a predetermined condition among the plurality of model structures. Therefore, the pruning search space corresponding to each round can be considered as a subspace of the initial pruning search space, and thus, the composition of the pruning search space corresponding to each round is the same as that of the initial pruning search space, i.e., the pruning search space corresponding to each round may include a plurality of model structures, each model structure. Each model structure may comprise at least one model substructure and a connection relationship between different model substructures. After the pre-training model corresponding to each round is trained, the model parameters of each model structure included in the trained pre-training model are determined, and thus, the model parameters of each model structure included in the pruning search space corresponding to each round are determined.
According to the embodiment of the disclosure, the complexity of the model structure included in the pruning search space corresponding to different rounds can be the same or different. The complexity of the model structure can be characterized by a pruning value. The larger the value of the pruning rate value, the smaller the complexity of the model structure.
According to an embodiment of the present disclosure, the set of candidate model structures may refer to a set for determining a target model structure, i.e. a set for determining a pre-trained model. The set of candidate models corresponding to each round may be determined from a plurality of model structures included in the pruning search space corresponding to each round based on a screening policy. The set of candidate model structures corresponding to each turn may include a plurality of model structures. The model structures comprised by the set of candidate model structures corresponding to each round may be referred to as candidate model structures, i.e. the set of candidate model structures corresponding to each round may comprise a plurality of candidate model structures. The screening strategy may be determined based on the screening requirements. For example, the number of expected model structures and the type of model structures corresponding to each round may be determined according to the screening requirements corresponding to each round. And searching a plurality of model structures which are matched with the expected number and type requirements of the model structures corresponding to each round from the pruning search space corresponding to the round. And obtaining a candidate model structure set corresponding to the round according to the plurality of model structures corresponding to the round. A number of model structures matching the number and type requirements of the expected model structures corresponding to each round may be found from the pruning search space corresponding to each round based on a random sampling strategy. The set of candidate model structures corresponding to each turn may include a number of candidate model structures greater than or equal to a predetermined number threshold. For example, the predetermined number threshold is 100 ten thousand.
According to an embodiment of the present disclosure, the current round in progress may be referred to as a current round. Thus, the set of candidate model structures corresponding to the current round may be referred to as the current set of candidate model structures. The pruning search space corresponding to the current round is referred to as a current pruning search space. The set of current candidate model structures may include a plurality of current candidate model structures.
According to an embodiment of the present disclosure, the predetermined condition may be used as a condition for determining a target model structure from a set of candidate model structures. For example, the predetermined conditions include a performance evaluation condition and a pruning condition.
According to an embodiment of the present disclosure, a current candidate model structure set corresponding to a current pruning search space is determined from the current pruning search space corresponding to the current round. It is determined whether a target model structure exists in the current set of candidate model structures. If it is determined that a target model structure exists in the current set of candidate model structures, the target model structure may be determined to be a pre-trained model. If it is determined that the target model structure does not exist in the current candidate model structure set, the current candidate model structure set may be trained to obtain a next pre-training model corresponding to a next round. And obtaining a next pruning search space corresponding to the next round according to the next pre-training model corresponding to the next round. The next round may be determined as the new current round, whereby the next pre-trained model is determined as the new current pre-trained model. The next pruning search space is determined as the new current pruning search space. It should be noted that the "current candidate model structure set" in the training of the current candidate model structure set may be all or part of the current candidate model structures included in the current candidate model structure set.
Repeatedly executing the operation of determining a new current candidate model structure set from the new current pruning search space, training the new current candidate model structure set under the condition that the target model structure does not exist in the new current candidate model structure set, obtaining the operation of a next pre-training model corresponding to the next round, obtaining the operation of a next pruning search space according to the next pre-training model, determining the next pruning search space as the operation of the new current pruning search space, and determining the next pre-training model as the operation of the new current pre-training model until a preset condition is met, so as to obtain the target model structure, namely, the pre-training model.
In the case where the current round is the first round, a set of training model structures corresponding to the first round may be determined from the initial pruning search space. And processing the training set by using the training model structure set corresponding to the first turn to obtain a pre-training model corresponding to the first turn after training. And obtaining a pruning search space corresponding to the first round according to the pre-training model corresponding to the first round. A set of candidate model structures corresponding to the first round is determined from the pruning search space corresponding to the first round. And under the condition that the target model structure does not exist in the candidate model structure set corresponding to the first round, training the candidate model structure set corresponding to the first round to obtain a pre-training model corresponding to the second round. And obtaining a pruning search space corresponding to the second round according to the pre-training model corresponding to the second round.
In the case where the current round is a second round, a set of candidate model structures corresponding to the second round may be determined from the pruning search space corresponding to the second round. And under the condition that the target model structure does not exist in the candidate model structure set corresponding to the second round, training the candidate model structure set corresponding to the second round to obtain a pre-training model corresponding to the third round. And obtaining a pruning search space corresponding to the third round according to the pre-training model corresponding to the third round.
And executing the training process until a target model structure meeting the preset conditions is obtained, namely obtaining a pre-training model.
According to the embodiment of the disclosure, the candidate model structure set is determined from the pruning search space determined according to the pre-training model, and under the condition that the target model structure does not exist in the candidate model structure set, the candidate model structure set is trained until the preset condition is met to obtain the target pre-training model, so that the pre-training model meeting the performance evaluation condition is obtained by automatic pruning, and the prediction precision of the target pre-training model is improved. Therefore, the pre-training model with the smaller scale (namely the target pre-training model) can achieve the same prediction precision as the pre-training model with the larger scale, and the training speed of the pre-training model with the smaller scale is higher. On this basis, if the target pre-training model is applied to a chip to perform a data processing task, the core competitiveness of the related product can be improved.
According to an embodiment of the present disclosure, the method for generating the pre-training model may further include the following operations.
And processing the current candidate model structure set by using a performance predictor corresponding to the pruning search space to obtain a current performance index set of the candidate model structure set.
According to an embodiment of the present disclosure, operation S220 may include the following operations.
And under the condition that the target model structure does not exist in the candidate model structure set according to the performance index set and the pruning information set of the candidate model structure set, determining a candidate model structure subset from the candidate model structure set according to the performance index set and the pruning information set. And training the candidate model structure subset until a preset condition is met to obtain a target pre-training model.
According to embodiments of the present disclosure, a performance predictor may be used to predict the performance of a model structure. The performance predictor may characterize a model of the relationship between the model structure and the performance of the model structure. The performance predictor may be a model between the model structure trained using a machine learning model or a deep learning model and the performance of the model structure. For example, the machine learning model may include a random forest model or a ridge regression model, or the like. The performance predictor can also be a model of the relationship between the model structure and the performance of the model structure, which is constructed by utilizing a statistical model. The statistical model may include a probability distribution model. For example, the probability distribution model may include a gaussian distribution model or the like.
According to an embodiment of the present disclosure, a set of performance indicators may include a plurality of performance indicators. The performance index item has a performance index corresponding to the performance index item. The performance index item may be used as an index for evaluating the performance of the model structure. The performance index may be used as an index value to evaluate the performance of the candidate model structure. The at least one performance indicator item may include at least one of prediction accuracy, operation speed, hardware delay duration, memory occupancy, processor power consumption, and operation efficiency. Accordingly, the at least one performance indicator may include at least one of a prediction accuracy value, a running speed value, a hardware latency value, a memory footprint value, a processor power consumption value, an operational efficiency value, and the like.
According to an embodiment of the present disclosure, for a set of candidate model structures, each candidate model structure of a plurality of candidate model structures included in the set of candidate model structures has at least one performance indicator corresponding to the candidate model structure. The set of performance indicators may comprise a set of performance indicators corresponding to each of the at least one performance indicator item, if the set of performance indicators is divided by the performance indicator items. The set of performance indicators corresponding to each performance indicator item may be referred to as a set of performance indicator classes. Each set of performance indicator classes comprises a plurality of performance indicators belonging to the same performance indicator item. The plurality of performance indicators included in each set of performance indicator classes are performance indicators corresponding to each candidate model structure of the plurality of candidate model structures.
According to an embodiment of the present disclosure, the set of pruning information may include a plurality of pruning information. The pruning information may include pruning rate values for the model structure. Pruning information may be used as one of the bases for evaluating whether a candidate model structure is a target model structure. The pruning information may be preset. Each model substructure has a pruning information corresponding to the model substructure. Each model structure has a plurality of pruning information corresponding to the model structure.
According to an embodiment of the present disclosure, each round (i.e., each pruning search space) has a performance predictor, a set of performance indicators, and pruning information corresponding to the round. The performance predictor corresponding to the current round may be referred to as the current performance predictor, i.e., the performance predictor corresponding to the current pruning search space may be referred to as the current performance predictor. The set of performance indicators corresponding to the current round is referred to as the current set of performance indicators, i.e. the set of performance indicators corresponding to the current pruning search space is referred to as the current set of performance indicators. The current set of performance indicators may include a plurality of current performance indicators. The set of performance indicators corresponding to the current round is referred to as the current set of performance indicators, i.e., the set of performance indicators corresponding to the current pruning search space is referred to as the current set of performance indicators. The pruning information set corresponding to the current round is referred to as a current pruning information set, that is, the pruning information set corresponding to the current pruning search space is referred to as a current pruning information set. The current pruning information set may include a plurality of current pruning information.
According to an embodiment of the present disclosure, each current candidate model structure of a plurality of current candidate model structures included in the current candidate model structure set may be input into the current performance predictor, so as to obtain at least one current performance index corresponding to the current candidate model structure.
According to an embodiment of the present disclosure, after obtaining at least one current performance indicator corresponding to each current candidate model structure included in the current candidate model structure set, it may be determined whether there is a current candidate model structure satisfying a predetermined condition in the current candidate model structure set according to at least one current performance indicator corresponding to each current candidate model structure of a plurality of current candidate model structures included in the current candidate model structure set and at least one current pruning information.
According to an embodiment of the present disclosure, if it is determined that there is a current candidate model structure satisfying a predetermined condition in the current candidate model structure set, the current candidate model structure satisfying the predetermined condition may be determined as a pre-training model. If it is determined that the current candidate model structure meeting the predetermined condition does not exist in the current candidate model structure set, a current candidate model structure subset can be determined from the current candidate model structure set according to the current performance index set and the current pruning information set so as to train the current candidate model structure subset until the predetermined condition is met, and a target pre-training model is obtained.
According to embodiments of the present disclosure, the performance indicators may include prediction accuracy values. The pruning information may include a pruning rate value. Determining a candidate model structure subset from the candidate model structure set according to the performance index set and the pruning information set, which may include: and determining the candidate model structure of which the prediction precision value is greater than or equal to the prediction precision threshold value and the pruning rate value meets the pruning rate condition as the model structure included in the candidate model structure subset.
According to an embodiment of the present disclosure, the method for generating the pre-training model may further include the following operations.
A set of evaluation model structures is determined from the pruning search space. And obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set.
According to an embodiment of the present disclosure, the set of evaluation model structures may include a plurality of model structures. The model structure comprised by the set of evaluation model structures may be referred to as an evaluation model structure. The evaluation model structure may refer to a representative model structure in the pruning search space corresponding to the pre-trained model. Representativeness may refer to features that can characterize the model structure in the pruning search space. The model parameters of the evaluation model structure may be determined from the model parameters of the model structure corresponding to the evaluation model structure in the pre-trained model, i.e. the model parameters of the evaluation model structure may be consistent with the model parameters of the model structure corresponding to the evaluation model structure in the pre-trained model. The evaluation model structure may be used to assist in building performance predictors.
According to the embodiment of the disclosure, a plurality of model structures can be determined from the pruning search space based on the representative strategy, and an evaluation model structure set is obtained according to the plurality of model structures. And obtaining a performance predictor by utilizing an evaluation model structure set based on the evaluation set. The evaluation set may include a plurality of training samples.
According to an embodiment of the present disclosure, obtaining a performance predictor by using an evaluation model structure set based on an evaluation set may include: the evaluation set can be processed by utilizing the evaluation model structure set to obtain a performance index set corresponding to the evaluation model structure set. And obtaining the performance predictor by utilizing the evaluation model structure set, the performance index set corresponding to the evaluation model structure set and the predetermined model.
According to an embodiment of the present disclosure, obtaining the performance predictor by using the evaluation model structure set, the performance index set corresponding to the evaluation model structure set, and the predetermined model may include: and updating the hyper-parameters of the initial probability model by using a prediction method based on the performance index set corresponding to the evaluation model structure set to obtain the prediction values of the hyper-parameters. Based on the predicted values of the hyper-parameters, a performance predictor is determined. The initial probability model may be a probability distribution model obtained by initializing a probability distribution model corresponding to the initial pruning search space.
According to an embodiment of the present disclosure, obtaining the performance predictor by using the evaluation model structure set, the performance index set corresponding to the evaluation model structure set, and the predetermined model may include: and training the machine learning model or the deep learning model by utilizing the evaluation model structure set and the performance index set corresponding to the evaluation model structure set to obtain a performance predictor.
According to the embodiment of the disclosure, the performance predictor corresponding to the first turn may be trained in the above manner, and the performance predictor corresponding to the first turn is determined as the performance predictor corresponding to any other turn. That is, the performance predictor corresponding to the initial pruning search space may be trained in the above manner, and the performance predictor corresponding to the initial pruning search space may be determined as the performance predictor corresponding to the other pruning search space. Any other round may refer to any round other than the first round.
According to the embodiment of the disclosure, by training the performance predictor corresponding to the initial pruning search space and determining the performance predictor corresponding to the initial pruning search space as the performance predictor corresponding to other pruning search spaces, the time consumption for generating the pre-training model can be reduced, and the efficiency for generating the pre-training model can be improved.
According to the embodiment of the present disclosure, the performance predictor corresponding to each round may also be trained for each round by using the above-mentioned training method, that is, the performance predictor corresponding to each pruning search space may be trained for each pruning search space of each round.
According to an embodiment of the present disclosure, obtaining a current performance predictor corresponding to a current pruning search space by using a current evaluation model structure set may include: and adjusting the model parameters of the previous performance predictor corresponding to the previous pruning search space by using the current evaluation model structure set to obtain the current performance predictor corresponding to the current pruning search space.
According to the embodiment of the disclosure, the prediction accuracy of the performance predictor can be improved by training the performance predictor corresponding to each pruning search space, and the prediction accuracy of the pre-training model is further improved.
According to an embodiment of the present disclosure, obtaining a performance predictor corresponding to a pruning search space by using an evaluation model structure set may include the following operations.
And processing the evaluation set by using the evaluation model structure set to obtain a performance index set of the evaluation model structure set. And obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set and the performance index set.
According to the embodiment of the disclosure, aiming at each evaluation model structure in a plurality of evaluation model structures included in an evaluation model structure set, the evaluation set is input into the current evaluation model structure, and the performance index corresponding to the evaluation model structure is obtained. And obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set, the performance index set corresponding to the evaluation model structure set and a preset model. The predetermined model may include a machine learning model, a deep learning model, a statistical model, or the like.
According to an embodiment of the present disclosure, obtaining a performance predictor corresponding to a pruning search space by using an evaluation model structure set and a performance index set may include the following operations.
An evaluation model encoding set of the evaluation model structure set is determined. And obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model coding set and the performance index set.
According to an embodiment of the present disclosure, the model structure may be characterized by a model code, i.e. each evaluation model structure in the set of evaluation model structures may be processed by the code generator resulting in an evaluation model code corresponding to each evaluation model structure.
According to an embodiment of the present disclosure, training a subset of candidate model structures may include the following operations.
And adjusting the model parameters of the candidate model structure subset based on a re-parameter method.
According to the embodiment of the disclosure, the current model structure subset can be trained by using a re-parameter method until a predetermined condition is met, so that a target pre-training model is obtained.
Determining a set of evaluation model structures from a pruning search space, in accordance with embodiments of the present disclosure, may include the following operations.
An information entropy of each model structure of a plurality of model structures included in a pruning search space is determined. And determining an evaluation model structure set from the pruning search space according to the information entropy of each model structure in the plurality of model structures included in the pruning search space.
According to embodiments of the present disclosure, information entropy may be used to characterize a metric of information quantity. The information entropy of the model structure may be utilized to determine a set of evaluation model structures from a plurality of model structures included in the pruning search space.
According to an embodiment of the present disclosure, a model encoding may be determined for each of a plurality of model structures included in a pruning search space. And determining a covariance matrix according to the hyper-parameters of the probability model and the model code of each model structure. And determining the information entropy of each model structure according to the covariance matrix. The above-described manner of determining the information entropy of the model structure is only an exemplary embodiment, but is not limited thereto, and may also include a determination manner known in the art as long as the determination of the information entropy of the model structure can be achieved.
According to an embodiment of the present disclosure, determining an evaluation model structure set from a pruning search space according to an information entropy of each model structure of a plurality of model structures included in the pruning search space may include: the information entropy of each model structure in a plurality of model structures included in the pruning search space is sorted. And determining an evaluation model structure set from the pruning search space according to the sorting result. The sorting may include sorting according to the order of the information entropy from small to large or sorting according to the order of the information entropy from large to small. For example, each of the plurality of model structures included in the pruning search space may be sorted in order of decreasing entropy of the model structures, and a predetermined number of model structures sorted in the top of the sorting result may be determined as the evaluation model structure set. Alternatively, the set of evaluation model structures may be determined from the plurality of model structures included in the pruning search space based on an information entropy threshold and an information entropy of each of the plurality of model structures included in the pruning search space. For example, for each of a plurality of model structures included in the pruning search space, in a case where it is determined that the information entropy of the model structure is greater than or equal to the information entropy threshold, the model structure is determined as the evaluation model structure.
Determining a set of evaluation model structures from a pruning search space, in accordance with embodiments of the present disclosure, may include the following operations.
And determining at least one clustering center of the pruning search space according to a plurality of model structures included in the pruning search space. And obtaining an evaluation model structure set according to at least one clustering center of the pruning search space.
According to the embodiment of the disclosure, a clustering algorithm can be used for processing the model structure in the plurality of model structures included in the current pruning search space, so as to obtain at least one clustering center corresponding to the pruning search space. The clustering algorithm may include a K-means clustering algorithm, a K-center clustering algorithm, a CLARA (clustering LARge application) algorithm, or a fuzzy C-means algorithm.
According to an embodiment of the present disclosure, each of at least one cluster center corresponding to a pruning search space may be determined as an evaluation model structure.
According to an embodiment of the present disclosure, the predetermined condition includes a performance evaluation condition and a pruning condition. The pruning conditions comprise a pruning rate value condition or a pruning wheel condition. The pruning information includes a pruning rate value or a pruning wheel.
According to an embodiment of the present disclosure, the target model structure is a candidate model structure satisfying a predetermined condition among a plurality of candidate model structures, and may include one of:
the target model structure is a candidate model structure in which the current performance index satisfies the performance evaluation condition and the current pruning rate value satisfies the pruning rate value condition among the plurality of candidate model structures. The target model structure is a candidate model structure in which the current performance index satisfies the performance evaluation condition and the current pruning wheel satisfies the pruning wheel condition among the plurality of candidate model structures.
According to implementations of the present disclosure, the performance indicators may include prediction accuracy values. The target model structure may be a candidate model structure in which the performance index satisfies the performance evaluation condition and the pruning rate value satisfies the pruning rate condition among the plurality of model structures, and may include: the target model structure may be a candidate model structure having a maximum prediction precision value and a pruning rate value satisfying a pruning rate condition among the plurality of candidate model structures.
According to an embodiment of the present disclosure, the target model structure may be a candidate model structure in which the performance index satisfies the performance evaluation condition and the pruning rate value satisfies the pruning rate condition among the plurality of candidate model structures, and may include: the target model structure may be a candidate model structure with a largest prediction precision value and a largest round of pruning rounds among the plurality of candidate model structures.
The method for generating the pre-training model according to the embodiment of the disclosure is further described with reference to fig. 3 to 4.
Fig. 3 schematically illustrates a schematic diagram of a generation process of a pre-trained model according to an embodiment of the present disclosure.
As shown in fig. 3, at 300, a set of evaluation model structures 302 is determined from a pruning search space 301. And processing the evaluation set 303 by using the evaluation model structure 302 to obtain a performance index set 304 of the evaluation model structure set 302.
The performance predictor 306 is obtained by using the evaluation model structure set 302, the performance index set 304 and the predetermined model 305.
A set of candidate model structures 307 is determined from the pruning search space 301. The performance predictor 306 is utilized to process the candidate model structure set 307, and a performance index set 308 of the candidate model structure set 307 is obtained.
In the case where it is determined from the performance index set 308 and the pruning information set 309 that there is a target model structure satisfying a predetermined condition in the candidate model structure set 307, the candidate model structure satisfying the predetermined condition is determined as the target model structure 310. The target model structure 310 is determined as a target pre-training model 311.
And under the condition that the model structure does not exist in the candidate model structure set 307 according to the performance index set 308 and the pruning information set 309, executing the training of the candidate model structure set 307 until a preset condition is met, and obtaining a target pre-training model.
FIG. 4 schematically illustrates an example schematic of a generation process of a pre-trained model according to an embodiment of the disclosure.
As shown in fig. 4, each candidate model structure includes a plurality of convolution kernels at 400. The multiple convolution kernels come from different layers of operation. That is, each current candidate model structure includes multiple layers of operation, each layer of operation including multiple convolution kernels. The pruning rate for each layer is related to the number of convolution kernels included for each layer. The pruning rate per layer may comprise 4, i.e. from 0.05, 0.35, 0.65 and 0.95. Thus, the initial pruning search space may include 81 model structures.
In the case that the current round is the first round, the current candidate model structure set corresponding to the current pruning search space (i.e., the initial pruning search space) includes 99 current candidate model structures, which are current candidate model structures 401 and … …, current candidate model structures 450 and … …, and current candidate model structure 481, respectively.
By using the generation method of the pre-training model described in the embodiment of the present disclosure, the repeated operation is performed for 14 times, and a current candidate model structure set corresponding to the current round which is the 15 th round is obtained, where the candidate model structure set corresponding to the current round includes 49 current candidate model structures, which are distributed as the current candidate model structures 450 and … … and the current candidate model structure 481.
The repeating operation is continued for 20 times again, and the current candidate model structure 450 is determined as the target model structure satisfying the predetermined condition. And determining the target model structure as a pre-training model.
The above is only an exemplary embodiment, but is not limited thereto, and other generation methods of the pre-trained model known in the art may be included as long as the prediction accuracy of the pre-trained model can be improved.
Fig. 5 schematically shows a block diagram of a generation apparatus of a pre-trained model according to an embodiment of the present disclosure.
As shown in fig. 5, the generating apparatus 500 of the pre-training model may include a first determining module 510, a second determining module 520, and a first obtaining module 530.
The first determining module 510 determines a pruning search space according to the pre-trained model.
A second determining module 520, configured to determine a set of candidate model structures from the pruning search space. The set of candidate model structures includes a plurality of candidate model structures.
A first obtaining module 530, configured to train the candidate model structure set until a predetermined condition is met under a condition that it is determined that the target model structure does not exist in the candidate model structure set, so as to obtain a target pre-training model.
According to an embodiment of the present disclosure, the apparatus 500 for generating a pre-training model may further include a second obtaining module.
The second obtaining module is used for processing the candidate model structure set by using the performance predictor corresponding to the pruning search space to obtain a performance index set of the candidate model structure set;
according to an embodiment of the present disclosure, the first obtaining module 530 may include a first determining submodule and a first obtaining submodule.
And the first determining submodule is used for determining a candidate model structure subset from the candidate model structure set according to the performance index set and the pruning information set of the candidate model structure set under the condition that the target model structure does not exist in the candidate model structure set according to the performance index set and the pruning information set of the candidate model structure set.
And the first obtaining submodule is used for training the candidate model structure subset until a preset condition is met, so that a target pre-training model is obtained.
According to an embodiment of the present disclosure, the apparatus 500 for generating a pre-training model may further include a third determining module and a third obtaining module.
And the third determining module is used for determining the evaluation model structure set from the pruning search space.
And the third obtaining module is used for obtaining the performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set.
According to an embodiment of the present disclosure, the third obtaining module may include a second obtaining sub-module and a third obtaining sub-module.
And the second obtaining submodule is used for processing the evaluation set by utilizing the evaluation model structure set to obtain a performance index set corresponding to the pruning search space.
And the third obtaining submodule is used for obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set and the performance index set.
According to an embodiment of the present disclosure, the third obtaining sub-module may include a second determining unit and a second obtaining unit.
A second determining unit for determining an evaluation model coding set of the evaluation model structure set.
And the second obtaining unit is used for obtaining the performance predictor corresponding to the pruning search space by utilizing the evaluation coding set and the performance index set.
According to an embodiment of the present disclosure, the first obtaining sub-module may include a third obtaining unit.
And the third obtaining unit is used for adjusting the model parameters of the candidate model structure subset based on a re-parameter method.
According to an embodiment of the present disclosure, the second determination module may include a second determination submodule and a third determination submodule.
And the second determining submodule is used for determining the information entropy of each model structure in the plurality of model structures included in the pruning search space.
And the third determining submodule is used for determining an evaluation model structure set from the pruning search space according to the information entropy of each model structure in the plurality of model structures included in the pruning search space.
According to an embodiment of the present disclosure, the second determination module may include a fourth determination submodule and a fifth determination submodule.
And the fourth determining submodule is used for determining at least one clustering center of the pruning search space according to a plurality of model structures included in the pruning search space.
And the fifth determining submodule is used for obtaining an evaluation model structure set according to at least one clustering center of the pruning search space.
According to an embodiment of the present disclosure, the predetermined condition includes a performance evaluation condition and a pruning condition. The pruning conditions comprise a pruning rate value condition or a pruning wheel condition. The pruning information includes a pruning rate value or a pruning wheel.
According to an embodiment of the present disclosure, the target model structure is a candidate model structure satisfying a predetermined condition among a plurality of candidate model structures, including one of:
the target model structure is a candidate model structure in which the performance index satisfies the performance evaluation condition and the pruning rate value satisfies the pruning rate value condition among the plurality of candidate model structures. The target model structure is a candidate model structure in which the performance index satisfies the performance evaluation condition and the pruning round satisfies the pruning round condition among the plurality of candidate model structures.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described above.
According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.
FIG. 6 schematically illustrates a block diagram of an electronic device adapted to implement a method of generating a pre-trained model according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the generation method of the pre-training model. For example, in some embodiments, the generation method of the pre-trained model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the method for generating a pre-trained model described above may be performed. Alternatively, in other embodiments, the calculation unit 601 may be configured by any other suitable means (e.g. by means of firmware) to perform the generation method of the pre-trained model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (21)

1. A method of generating a pre-trained model, comprising:
determining a pruning search space according to a pre-training model;
determining a set of candidate model structures from the pruning search space, wherein the set of candidate model structures comprises a plurality of candidate model structures; and
and under the condition that the target model structure does not exist in the candidate model structure set, training the candidate model structure set until a preset condition is met to obtain a target pre-training model.
2. The method of claim 1, further comprising:
processing the candidate model structure set by using a performance predictor corresponding to the pruning search space to obtain a performance index set of the candidate model structure set;
under the condition that it is determined that the target model structure does not exist in the candidate model structure set, training the candidate model structure set until a predetermined condition is met to obtain a target pre-training model, wherein the training comprises:
under the condition that the target model structure does not exist in the candidate model structure set according to the performance index set and the pruning information set of the candidate model structure set, determining a candidate model structure subset from the candidate model structure set according to the performance index set and the pruning information set; and
and training the candidate model structure subset until the preset condition is met to obtain the target pre-training model.
3. The method of claim 2, further comprising:
determining a set of evaluation model structures from the pruning search space; and
and obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set.
4. The method of claim 3, wherein said using the set of evaluation model structures to derive a performance predictor corresponding to the pruning search space comprises:
processing an evaluation set by utilizing the evaluation model structure set to obtain a performance index set corresponding to the pruning search space; and
and obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set and the performance index set.
5. The method of claim 4, wherein the deriving a performance predictor corresponding to the pruning search space using the set of evaluation model structures and the set of performance indicators comprises:
determining an evaluation model encoding set of the evaluation model structure set; and
and obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation coding set and the performance index set.
6. The method of any of claims 2-5, wherein the training the subset of candidate model structures comprises:
and adjusting the model parameters of the candidate model structure subset based on a re-parameter method.
7. The method of any of claims 3-6, wherein the determining a set of evaluation model structures from the pruning search space comprises:
determining an information entropy of each model structure in a plurality of model structures included in the pruning search space; and
and determining the evaluation model structure set from the pruning search space according to the information entropy of each model structure in the plurality of model structures included in the pruning search space.
8. The method of any of claims 3-6, wherein the determining a set of evaluation model structures from the pruning search space comprises:
determining at least one clustering center of the pruning search space according to a plurality of model structures included in the pruning search space; and
and obtaining the evaluation model structure set according to at least one clustering center of the pruning search space.
9. The method according to any one of claims 2 to 8, wherein the predetermined conditions include a performance evaluation condition and a pruning condition;
the pruning condition comprises a pruning rate value condition or a pruning wheel condition;
the pruning information comprises a pruning rate value or a pruning wheel;
wherein the target model structure is a candidate model structure satisfying a predetermined condition among a plurality of candidate model structures, including one of:
the target model structure is a candidate model structure of which the performance index meets the performance evaluation condition and the pruning rate value meets the pruning rate value condition in the plurality of candidate model structures; and
the target model structure is a candidate model structure in which the performance index satisfies the performance evaluation condition and the pruning round satisfies the pruning round condition among the plurality of candidate model structures.
10. An apparatus for generating a pre-trained model, comprising:
the first determining module is used for determining a pruning search space according to the pre-training model;
a second determining module, configured to determine a set of candidate model structures from the pruning search space, where the set of candidate model structures includes a plurality of candidate model structures; and
and the first obtaining module is used for training the candidate model structure set under the condition that the target model structure does not exist in the candidate model structure set until a preset condition is met, so as to obtain a target pre-training model.
11. The apparatus of claim 10, further comprising:
a second obtaining module, configured to process the candidate model structure set by using a performance predictor corresponding to the pruning search space, to obtain a performance index set of the candidate model structure set;
wherein the first obtaining module includes:
a first determining sub-module, configured to determine, when it is determined that the target model structure does not exist in the candidate model structure set according to the performance index set and the pruning information set of the candidate model structure set, a candidate model structure sub-set from the candidate model structure set according to the performance index set and the pruning information set; and
and the first obtaining submodule is used for training the candidate model structure subset until the preset condition is met, so that the target pre-training model is obtained.
12. The apparatus of claim 11, further comprising:
a third determining module, configured to determine a set of evaluation model structures from the pruning search space;
and the third obtaining module is used for obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set.
13. The apparatus of claim 12, wherein the third obtaining means comprises:
the second obtaining submodule is used for processing an evaluation set by utilizing the evaluation model structure set to obtain a performance index set corresponding to the pruning search space; and
and the third obtaining submodule is used for obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation model structure set and the performance index set.
14. The apparatus of claim 13, wherein the third obtaining submodule comprises:
a second determining unit, configured to determine an evaluation model encoding set of the evaluation model structure set; and
and the second obtaining unit is used for obtaining a performance predictor corresponding to the pruning search space by utilizing the evaluation coding set and the performance index set.
15. The apparatus of any one of claims 11-14, wherein the first obtaining submodule comprises:
a third obtaining unit, configured to adjust the model parameters of the candidate model structure subset based on a re-parameter method.
16. The apparatus of any of claims 12-15, wherein the second determining means comprises:
a second determining submodule, configured to determine an information entropy of each model structure in a plurality of model structures included in the pruning search space; and
a third determining submodule, configured to determine the evaluation model structure set from the pruning search space according to an information entropy of each model structure in the plurality of model structures included in the pruning search space.
17. The apparatus of any of claims 12-15, wherein the second determining means comprises:
a fourth determining submodule, configured to determine at least one clustering center of the pruning search space according to a plurality of model structures included in the pruning search space; and
and the fifth determining submodule is used for obtaining the evaluation model structure set according to at least one clustering center of the pruning search space.
18. The apparatus of any one of claims 11 to 17, wherein the predetermined conditions include a performance evaluation condition and a pruning condition;
the pruning condition comprises a pruning rate value condition or a pruning wheel condition;
the pruning information comprises a pruning rate value or a pruning wheel;
wherein the target model structure is a candidate model structure satisfying a predetermined condition among a plurality of candidate model structures, including one of:
the target model structure is a candidate model structure of which the performance index meets the performance evaluation condition and the pruning rate value meets the pruning rate value condition in the plurality of candidate model structures; and
the target model structure is a candidate model structure in which the performance index satisfies the performance evaluation condition and the pruning round satisfies the pruning round condition among the plurality of candidate model structures.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 9.
CN202111310486.5A 2021-11-05 2021-11-05 Pre-training model generation method and device, electronic equipment and storage medium Pending CN114037061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111310486.5A CN114037061A (en) 2021-11-05 2021-11-05 Pre-training model generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111310486.5A CN114037061A (en) 2021-11-05 2021-11-05 Pre-training model generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114037061A true CN114037061A (en) 2022-02-11

Family

ID=80143252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111310486.5A Pending CN114037061A (en) 2021-11-05 2021-11-05 Pre-training model generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114037061A (en)

Similar Documents

Publication Publication Date Title
EP4064277B1 (en) Method and apparatus for training speech recognition model, device and storage medium
CN112560996A (en) User portrait recognition model training method, device, readable storage medium and product
CN114118287A (en) Sample generation method, sample generation device, electronic device and storage medium
CN114065864A (en) Federal learning method, federal learning device, electronic device, and storage medium
CN114037059A (en) Pre-training model, model generation method, data processing method and data processing device
CN113642727A (en) Training method of neural network model and processing method and device of multimedia information
CN114037060A (en) Pre-training model generation method and device, electronic equipment and storage medium
CN113516185B (en) Model training method, device, electronic equipment and storage medium
CN113361621B (en) Method and device for training model
CN115203564A (en) Information flow recommendation method and device and computer program product
CN114610953A (en) Data classification method, device, equipment and storage medium
CN114037057B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN114037061A (en) Pre-training model generation method and device, electronic equipment and storage medium
CN112560987A (en) Image sample processing method, device, equipment, storage medium and program product
CN114021642A (en) Data processing method and device, electronic equipment and storage medium
CN113961765A (en) Searching method, device, equipment and medium based on neural network model
CN113987260A (en) Video pushing method and device, electronic equipment and storage medium
CN114037058B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN113326885A (en) Method and device for training classification model and data classification
CN112905885A (en) Method, apparatus, device, medium, and program product for recommending resources to a user
US20230206075A1 (en) Method and apparatus for distributing network layers in neural network model
CN114821801B (en) Motion recognition method, model training method, device, electronic device and storage medium
CN116151392B (en) Training sample generation method, training method, recommendation method and device
CN115034388B (en) Determination method and device for quantization parameters of ranking model and electronic equipment
EP4134834A1 (en) Method and apparatus of processing feature information, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination