CN113657468A - Pre-training model generation method and device, electronic equipment and storage medium - Google Patents

Pre-training model generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113657468A
CN113657468A CN202110866832.1A CN202110866832A CN113657468A CN 113657468 A CN113657468 A CN 113657468A CN 202110866832 A CN202110866832 A CN 202110866832A CN 113657468 A CN113657468 A CN 113657468A
Authority
CN
China
Prior art keywords
training
network
model
candidate model
hyper
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110866832.1A
Other languages
Chinese (zh)
Inventor
希滕
张刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110866832.1A priority Critical patent/CN113657468A/en
Publication of CN113657468A publication Critical patent/CN113657468A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The present disclosure provides a method and an apparatus for generating a pre-training model, an electronic device, and a storage medium, which relate to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as image processing and image recognition. The scheme is as follows: executing a corresponding training task on a hyper-network comprising a plurality of models, selecting partial models from the trained hyper-network to perform model combination to obtain a plurality of groups of candidate model combinations, performing feature extraction on the first sample set by adopting each group of candidate model combinations, selecting a target model combination from the plurality of groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations, and generating a pre-training model according to the target model combination. According to the method, the multiple models are trained through the hyper-network, the model training speed and relevance are improved, the information content contained in the extracted features of the model combination is evaluated through the information entropy, the optimal model combination is obtained through screening, and the precision of the pre-training model is improved.

Description

Pre-training model generation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision and deep learning technologies, which can be applied to scenes such as image processing and image recognition, and in particular, to a method and an apparatus for generating a pre-training model, an electronic device, and a storage medium.
Background
In recent years, pre-training models have met with great success. The pre-training model is trained on an upstream task through a large amount of data, and then on a downstream task, a good result can be obtained only through training of a small amount of data. The pre-training model in the related technology has great limitation on scene migration and cannot meet the requirement of precision. Therefore, how to improve the accuracy of the generated pre-training model is an urgent technical problem to be solved.
Disclosure of Invention
The disclosure provides a generation method and device of a pre-training model, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a method for generating a pre-training model, including:
executing a corresponding training task on the super network to obtain a trained super network; wherein the super network comprises a plurality of models;
selecting at least part of models from the trained hyper-network to perform model combination to obtain a plurality of groups of candidate model combinations;
performing feature extraction on the first sample set by adopting each group of candidate model combination;
selecting a target model combination from the multiple groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations;
and generating a pre-training model according to the target model combination.
According to another aspect of the present disclosure, there is provided a generation apparatus of a pre-training model, including:
the training module is used for executing a corresponding training task on the super network to obtain a trained super network; wherein the super network comprises a plurality of models;
the combination module is used for selecting at least part of models from the trained hyper-network to carry out model combination to obtain a plurality of groups of candidate model combinations;
the characteristic extraction module is used for extracting characteristics of the first sample set by adopting each group of candidate model combinations;
the selection module is used for selecting a target model combination from the multiple groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations;
and the generating module is used for generating a pre-training model according to the target model combination.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the preceding aspect.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the preceding aspect.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of the preceding aspect.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart of a method for generating a pre-training model according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart illustrating another method for generating a pre-training model according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating another method for generating a pre-training model according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a device for generating a pre-training model according to an embodiment of the present disclosure;
fig. 5 is a schematic block diagram of an example electronic device 500 provided by embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
A method, an apparatus, an electronic device, and a storage medium for generating a pre-training model according to embodiments of the present disclosure are described below with reference to the drawings.
Fig. 1 is a schematic flow chart of a method for generating a pre-training model according to an embodiment of the present disclosure.
As shown in fig. 1, the method comprises the following steps:
step 101, executing a corresponding training task on the super network to obtain a trained super network, wherein the super network comprises a plurality of models.
The training task is related to a business scenario, such as an image processing task and an image recognition task. The plurality of models included in the super network are models of neural networks.
In the disclosed embodiment, the super network is a way to accelerate model training, wherein the super network is not a specific network architecture, but includes a set of multiple models, wherein the multiple models are the same type and different parameters. By training the initial state of the hyper-network, the parameters of each model in the hyper-network can be adjusted. The trained hyper-network also comprises the plurality of models, and the parameters of the models are adjusted to realize the training of the models. Therefore, under the scene of model set training, the corresponding training task is executed through the super network for training, so that the training of all models contained in the super network is realized through the training of the super network, the model training speed is higher, and compared with the mode of training each model independently, the model training speed is improved. Meanwhile, in the training process of the hyper-network disclosed by the invention, the complementary relation among a plurality of models can be determined, so that the precision is higher when the models are combined, and the performance of model combination is improved.
As a possible implementation manner, the training method for the super-network can train the super-network based on a One-time successful One-Shot Neural network structure Search (NAS) idea, namely corresponding training data is input into the super-network only once, namely parameters of the super-network are adjusted once, repeated iterative training is not needed, the network can be converged, and the training speed is improved. The training method of the super network will be specifically described in the following embodiments.
And 102, selecting at least part of models from the trained hyper-network to perform model combination to obtain a plurality of groups of candidate model combinations.
In the embodiment of the disclosure, a random search algorithm, an evolutionary search algorithm, an ant colony search algorithm, or a reinforcement learning algorithm may be adopted to obtain a plurality of groups of candidate model combinations from the trained hyper-network according to the set number of model combinations.
The set number of models may be a partial model in the super network or a total number of models in the super network.
And 103, performing feature extraction on the first sample set by adopting each group of candidate model combinations.
Wherein the first sample set contains a plurality of sample images.
In the embodiment of the present disclosure, each group of candidate model combinations may correspond to the same sample in the first sample set, or may be different samples, and meanwhile, the corresponding sample may be one or multiple samples. For each candidate model combination, the models included in the candidate model combination may correspond to the same sample or may correspond to different samples.
And 104, selecting a target model combination from the multiple groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations.
In the embodiment of the disclosure, the advantages and disadvantages of the features extracted by the model combination are evaluated through an information theory, wherein the larger the information entropy of the features is, the larger the information content contained in the features extracted by the model combination is, and the better the performance of the model combination is, so that the precision of the model combination obtained through screening is higher, namely the precision of the pre-training model is improved.
And 105, generating a pre-training model according to the target model combination.
The pre-training model generation method comprises the steps of executing a corresponding training task on a super network comprising a plurality of models, selecting partial models from the trained super network to perform model combination to obtain a plurality of groups of candidate model combinations, adopting each group of candidate model combinations to perform feature extraction on a first sample set, selecting a target model combination from the plurality of groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations, and generating a pre-training model according to the target model combination. According to the method, the multiple models are trained through the hyper-network, the model training speed and relevance are improved, the information content contained in the extracted features of the model combination is evaluated through the information entropy, the optimal model combination is obtained through screening, and the precision of the pre-training model is improved.
Based on the foregoing embodiments, this embodiment provides another generation method of a pre-training model, where there are a plurality of hyper-networks. Fig. 2 is a schematic flow chart of another method for generating a pre-training model according to an embodiment of the present disclosure, as shown in fig. 2, the method includes the following steps:
step 201, inputting training samples in the second sample set into each super network.
Wherein, the super network comprises a plurality of models.
In this embodiment of the present disclosure, the first sample set, the second sample set, and the third sample set may be the same sample set or different sample sets, and this embodiment is not limited thereto.
And 202, fusing the output characteristics of each super network to obtain fused characteristics.
And 203, executing a plurality of training tasks according to the fusion characteristics to obtain the prediction information of each training task.
The number of training tasks may be greater than or equal to the number of super networks.
In the embodiment of the disclosure, fusion characteristics obtained by fusing characteristics output by each super network are executed to obtain various training tasks, and prediction information of each training task is obtained, so that multiple training tasks are executed on multiple models included in each super network. Meanwhile, a plurality of training tasks are executed, so that a plurality of models contained in each super network can be subjected to feature extraction in a plurality of tasks, and the adaptable scenes of each super network are increased.
And step 204, determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task.
The standard information corresponding to the samples in the second sample set and the training tasks have a corresponding relationship, that is, the training tasks are different, and the standard information corresponding to the samples is different, that is, for each training task, the samples have corresponding standard information.
And step 205, carrying out weighted summation on the loss function values of the training tasks to obtain a total loss function value, and updating the parameters of the super networks according to the total loss function value.
As an implementation mode, the loss function values of all the training tasks can be fused in an average weighting mode to obtain a total loss function value; as another implementation manner, the weight of the loss function value of each training task may be determined according to the preset importance degree of the loss function value of each training task, that is, the importance of each training task is proportional to the weight of the loss function value, and further, the weight of the loss function value of each training task and the corresponding loss function value are subjected to weighting calculation to obtain the total loss function value. Furthermore, parameters of each super network are updated according to the total loss function values, so that the parameters of each model contained in each super network can be adjusted, meanwhile, in the process of training each super network, the parameter incidence relation among the super networks and the parameter incidence relation among the models in each super network are also considered, the speed and the precision of large-scale model training are improved, and the complementarity among the combined models can be improved when candidate models are selected and combined based on the super networks.
It should be noted that the super network can improve the training speed of each model, because when parameter adjustment is performed on each model in the super network through the fusion loss function, parameter adjustment of multiple models is realized according to a parameter sharing mode among the models, thereby reducing the number of adjustable parameters as a whole and improving the training speed of each model. When the parameters of the hyper-network are adjusted through the loss function of the hyper-network, because the parameters of each model in the hyper-network are shared, when the parameters of the models are adjusted, the models are mutually complemented, so that the precision of the combined model is higher when the modules are combined subsequently, and the performance of model combination is improved.
And step 206, selecting at least part of models from the trained hyper-networks for model combination to obtain a plurality of groups of candidate model combinations.
In the embodiment of the present disclosure, the candidate model combination may be obtained from a plurality of model combinations selected from one super network, or obtained from a plurality of model combinations selected from a plurality of super networks.
And step 207, performing feature extraction on the first sample set by adopting each group of candidate model combinations.
And step 208, for any group of candidate model combinations, determining the feature distribution of the features extracted from the first sample set in the feature space, and determining the information entropy of the features extracted from the group of candidate model combinations according to the feature distribution.
In an implementation manner of the embodiment of the present disclosure, feature distribution of extracted features in a feature space is combined according to a set of candidate models, information entropy of the features is determined according to a mean and a variance of the features in the feature distribution, information content included in the extracted features is measured through the information entropy of the features, the larger the information content is, the better the features extracted by subsequent model combinations are, and the higher the performance of the candidate models is, if the accuracy is, the higher the performance is.
Step 209, selecting a target model combination from the plurality of sets of candidate model combinations according to the information entropy of the features extracted by each set of candidate model combinations.
As an implementation mode, the candidate model combination with the maximum information entropy of the extracted features is selected from all the candidate model combinations, and the candidate model combination with the maximum information entropy is used as the target model combination, wherein the larger the information entropy of the extracted features is, the more information contained in the features extracted by the candidate model combination is, so that the better the performance of the candidate model combination is, and the precision of the target model combination is improved.
As another implementation manner, the candidate model combination with the largest information entropy of the extracted features is selected from all the candidate model combinations, the calculation time delay required by the candidate model combination with the largest information entropy to perform feature extraction on the first sample set is obtained, the calculation time delay of the candidate model combination with the largest information entropy is determined to be less than or equal to the set time length, and then the candidate model combination with the largest information entropy is used as the target model combination. The reason is that, in a scenario, there may be a plurality of candidate model combinations with the largest information entropy in parallel, in order to screen a more optimal candidate model combination, the calculation time delay required for feature extraction of the first sample set by each candidate model combination with the largest information entropy may be obtained, the candidate model combinations with the calculation time delay larger than the set time length are removed by the calculation time delay, and the candidate model combination with the largest information entropy and the calculation time delay smaller than or equal to the set time length is taken as the target model combination, so that the determined target model combination not only has higher precision, but also meets the speed requirement when performing feature extraction.
And step 210, generating a pre-training model according to the target model combination.
In the generation method of the pre-training model of the disclosed embodiment, the characteristics of a plurality of super-network outputs are fused to obtain a fusion characteristic, a plurality of training tasks are respectively executed based on the fusion characteristic, so as to obtain the performance of each hyper-network on different training tasks, and the loss function of each training task obtained by the joint training is used for indication, furthermore, the loss function values of the training tasks are weighted and summed to obtain a total loss function, the parameters of each hyper-network are updated by the total loss function, the relevance and the complementarity among the models are established, the rapid training of a plurality of models is realized, meanwhile, when the models in a plurality of super networks are combined subsequently, the determined combined model which can be used as a pre-training model has higher precision under the condition of the same speed, or, under the condition of the same precision, the speed is higher, and the speed of processing images or audios and videos on specific hardware or chips by the model can be improved. Meanwhile, compared with a mode of pre-training the models on a single task in the related art, the method has the problem of limitation of application scenes, and the performance of a plurality of models on different training tasks can be obtained through the performance of each hyper-network on different training tasks, so that the plurality of models can be adapted to scenes of various tasks.
Based on the foregoing embodiment, this embodiment provides another generation method of a pre-training model, where there are multiple hypernetworks, and each hypernetwork has a corresponding training task, and fig. 3 is a flowchart of the generation method of another pre-training model provided in this embodiment of the disclosure, and as shown in fig. 3, the method includes the following steps:
step 301, inputting the training samples in the third sample set into each super network to obtain the output characteristics of each super network.
Wherein, the super network comprises a plurality of models.
The second sample set and the third sample set in the embodiment of the present disclosure may be the same sample set, and the embodiment of the present disclosure is not limited thereto.
And step 302, executing corresponding training tasks according to the characteristics output by each hyper network to obtain the prediction information of each training task.
Step 303, determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task.
And step 304, updating the corresponding parameters of the hyper-network according to the loss function values of the training tasks.
In the embodiment of the disclosure, each hyper-network is trained for the training task corresponding to the hyper-network to obtain the trained hyper-network, so that when the hyper-network training is completed, a plurality of models in the hyper-network are also trained on the corresponding training task, and compared with a single model independent training mode, the training speed of the plurality of models is improved in a large-scale model training scene.
It should be noted that the super network can improve the training speed of each model, because when parameter adjustment is performed on each model in the super network through the fusion loss function, parameter adjustment of multiple models is realized according to a parameter sharing mode among the models, thereby reducing the number of adjustable parameters as a whole and improving the training speed of each model. In addition, when the parameters of the hyper-network are adjusted through the loss function of the hyper-network, because the parameters of each model in the hyper-network are shared, when the parameters of the models are adjusted, the models are mutually complemented, so that the precision of a combined model obtained when the modules are combined subsequently is higher, and the performance of model combination is improved.
And 305, selecting at least part of models from the trained hyper-networks for model combination to obtain a plurality of groups of candidate model combinations.
And step 306, performing feature extraction on the first sample set by adopting each group of candidate model combinations.
Step 307, for any group of candidate model combinations, determining the feature distribution of the features extracted from the first sample set in the feature space, and determining the information entropy of the features extracted from the group of candidate model combinations according to the feature distribution.
And 308, selecting a target model combination from the multiple groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations.
And 309, generating a pre-training model according to the target model combination.
The steps 305 to 309 can refer to the explanation in the foregoing method embodiment, and the principle is the same, and the present embodiment is not limited.
In the generation method of the pre-training model in the embodiment of the disclosure, the expressions of a plurality of models in each hyper-network on the corresponding training task can be obtained through the expressions of each hyper-network on the corresponding training task, so that the plurality of models can adapt to the scene of the corresponding task, thereby realizing the rapid training of the plurality of models, and simultaneously establishing the relevance and complementarity between the models, so that when the models in the plurality of hyper-networks are subsequently combined, the precision of the combined model is improved, and the adaptable task scene is increased.
In order to implement the foregoing embodiments, the present disclosure provides a device for generating a pre-training model.
Fig. 4 is a schematic structural diagram of a device for generating a pre-training model according to an embodiment of the present disclosure.
As shown in fig. 4, the apparatus includes:
a training module 41, configured to execute a corresponding training task on the super network to obtain a trained super network; wherein the super network comprises a plurality of models;
a combination module 42, configured to select at least part of models from the trained super-network to perform model combination, so as to obtain multiple sets of candidate model combinations;
a feature extraction module 43, configured to perform feature extraction on the first sample set by using each group of the candidate model combinations;
a selecting module 44, configured to select a target model combination from multiple sets of candidate model combinations according to information entropy of features extracted by each set of candidate model combinations;
and a generating module 45, configured to generate a pre-training model according to the target model combination.
Further, as an implementation manner, the apparatus further includes:
a determining module, configured to determine, for an arbitrary set of candidate model combinations, a feature distribution of features extracted for the first sample set in a feature space; and determining the information entropy of the features extracted by the group of candidate model combinations according to the feature distribution.
As an implementation, the selecting module 44 is configured to:
and selecting the candidate model combination with the largest information entropy from all the candidate model combinations, and taking the candidate model combination with the largest information entropy as the target model combination.
As an implementation, the selecting module 44 is further configured to:
obtaining the calculation time delay required by feature extraction of the candidate model combination with the maximum information entropy on the first sample set; and determining that the calculation time delay of the candidate model combination with the maximum information entropy is less than or equal to a set time length.
As an implementation, the number of the super networks is multiple, and the training module 41 is further configured to:
inputting training samples in a second sample set into each super network; fusing the characteristics output by each hyper-network to obtain fused characteristics; executing a plurality of training tasks according to the fusion characteristics to obtain the prediction information of each training task; determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task; weighting and summing the loss function values of the training tasks to obtain a total loss function value; and updating the parameters of each hyper-network according to the total loss function value.
As an implementation manner, there are a plurality of super networks, each super network has a corresponding training task, and the training module 41 is further configured to:
inputting training samples in a third sample set into each super network to obtain characteristics output by each super network; executing corresponding training tasks according to the characteristics output by each hyper-network to obtain the prediction information of each training task; determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task; and updating the corresponding parameters of the hyper-network according to the loss function values of the training tasks.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and the principle is the same, and is not repeated in this embodiment.
In the device for generating the pre-training model, a corresponding training task is executed on a super network comprising a plurality of models, partial models are selected from the trained super network to perform model combination to obtain a plurality of groups of candidate model combinations, each group of candidate model combinations is adopted to perform feature extraction on a first sample set, a target model combination is selected from the plurality of groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations, and the pre-training model is generated according to the target model combination. According to the method, the multiple models are trained through the hyper-network, the model training speed and relevance are improved, the information content contained in the extracted features of the model combination is evaluated through the information entropy, the optimal model combination is obtained through screening, and the precision of the pre-training model is improved.
In order to implement the above embodiments, an embodiment of the present disclosure provides an electronic device, including:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the foregoing method embodiments.
To achieve the above embodiments, the present disclosure provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the method of the foregoing method embodiments.
To implement the above embodiments, the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the method described in the foregoing method embodiments.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 5 is a schematic block diagram of an example electronic device 500 provided by embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the device 500 includes a computing unit 501, which can perform various appropriate actions and processes in accordance with a computer program stored in a ROM (Read-Only Memory) 502 or a computer program loaded from a storage unit 508 into a RAM (Random Access Memory) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An I/O (Input/Output) interface 505 is also connected to the bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing Unit 501 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 501 performs the respective methods and processes described above, such as the generation method of the pre-training model. For example, in some embodiments, the generation method of the pre-trained model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the method of generating a pre-trained model described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the generation method of the pre-trained model in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, Integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, System On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (Electrically Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet, and blockchain Network.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be noted that artificial intelligence is a subject for studying a computer to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware and software technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. A method of generating a pre-trained model, comprising:
executing a corresponding training task on the super network to obtain a trained super network; wherein the super network comprises a plurality of models;
selecting at least part of models from the trained hyper-network to perform model combination to obtain a plurality of groups of candidate model combinations;
performing feature extraction on the first sample set by adopting each group of candidate model combination;
selecting a target model combination from the multiple groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations;
and generating a pre-training model according to the target model combination.
2. The method according to claim 1, wherein before selecting the target model combination from the plurality of candidate model combinations according to the entropy of the features extracted from each of the candidate model combinations, the method further comprises:
determining the feature distribution of the features extracted from the first sample set in the feature space for an arbitrary group of candidate model combinations;
and determining the information entropy of the features extracted by the group of candidate model combinations according to the feature distribution.
3. The method according to claim 1, wherein the selecting a target model combination from the plurality of candidate model combinations according to the information entropy of the features extracted by each candidate model combination comprises:
selecting the candidate model combination with the largest information entropy from all the candidate model combinations;
and taking the candidate model combination with the maximum information entropy as the target model combination.
4. The method according to claim 3, wherein before the candidate model combination with the largest information entropy is taken as the target model combination, the method further comprises:
obtaining the calculation time delay required by feature extraction of the candidate model combination with the maximum information entropy on the first sample set;
and determining that the calculation time delay of the candidate model combination with the maximum information entropy is less than or equal to a set time length.
5. The method of any of claims 1-4, wherein the hyper-network is plural; the executing a corresponding training task on the super network to obtain a trained super network includes:
inputting training samples in a second sample set into each super network;
fusing the characteristics output by each hyper-network to obtain fused characteristics;
executing a plurality of training tasks according to the fusion characteristics to obtain the prediction information of each training task;
determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task;
weighting and summing the loss function values of the training tasks to obtain a total loss function value;
and updating the parameters of each hyper-network according to the total loss function value.
6. The method of any of claims 1-4, wherein the super network is plural, each of the super networks having a corresponding training task; the executing a corresponding training task on the super network to obtain a trained super network includes:
inputting training samples in a third sample set into each super network to obtain characteristics output by each super network;
executing corresponding training tasks according to the characteristics output by each hyper-network to obtain the prediction information of each training task;
determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task;
and updating the corresponding parameters of the hyper-network according to the loss function values of the training tasks.
7. An apparatus for generating a pre-trained model, comprising:
the training module is used for executing a corresponding training task on the super network to obtain a trained super network; wherein the super network comprises a plurality of models;
the combination module is used for selecting at least part of models from the trained hyper-network to carry out model combination to obtain a plurality of groups of candidate model combinations;
the characteristic extraction module is used for extracting characteristics of the first sample set by adopting each group of candidate model combinations;
the selection module is used for selecting a target model combination from the multiple groups of candidate model combinations according to the information entropy of the features extracted by each group of candidate model combinations;
and the generating module is used for generating a pre-training model according to the target model combination.
8. The apparatus of claim 7, wherein the apparatus further comprises:
a determining module, configured to determine, for an arbitrary set of candidate model combinations, a feature distribution of features extracted for the first sample set in a feature space; and determining the information entropy of the features extracted by the group of candidate model combinations according to the feature distribution.
9. The apparatus of claim 7, wherein the selecting module is configured to:
selecting the candidate model combination with the largest information entropy from all the candidate model combinations;
and taking the candidate model combination with the maximum information entropy as the target model combination.
10. The apparatus of claim 9, wherein the selecting module is further configured to:
obtaining the calculation time delay required by feature extraction of the candidate model combination with the maximum information entropy on the first sample set;
and determining that the calculation time delay of the candidate model combination with the maximum information entropy is less than or equal to a set time length.
11. The apparatus of any of claims 7-10, wherein the super network is plural; the training module is further configured to:
inputting training samples in a second sample set into each super network;
fusing the characteristics output by each hyper-network to obtain fused characteristics;
executing a plurality of training tasks according to the fusion characteristics to obtain the prediction information of each training task;
determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task;
weighting and summing the loss function values of the training tasks to obtain a total loss function value;
and updating the parameters of each hyper-network according to the total loss function value.
12. The apparatus of any of claims 7-10, wherein the super network is plural, each of the super networks having a corresponding training task; the training module is further configured to:
inputting training samples in a third sample set into each super network to obtain characteristics output by each super network;
executing corresponding training tasks according to the characteristics output by each hyper-network to obtain the prediction information of each training task;
determining a loss function value of each training task according to the difference between the prediction information of each training task and the standard information of the corresponding training task;
and updating the corresponding parameters of the hyper-network according to the loss function values of the training tasks.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.
CN202110866832.1A 2021-07-29 2021-07-29 Pre-training model generation method and device, electronic equipment and storage medium Pending CN113657468A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110866832.1A CN113657468A (en) 2021-07-29 2021-07-29 Pre-training model generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110866832.1A CN113657468A (en) 2021-07-29 2021-07-29 Pre-training model generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113657468A true CN113657468A (en) 2021-11-16

Family

ID=78479012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110866832.1A Pending CN113657468A (en) 2021-07-29 2021-07-29 Pre-training model generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113657468A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795314A (en) * 2023-02-07 2023-03-14 山东海量信息技术研究院 Key sample sampling method, system, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340221A (en) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 Method and device for sampling neural network structure
CN111553480A (en) * 2020-07-10 2020-08-18 腾讯科技(深圳)有限公司 Neural network searching method and device, computer readable medium and electronic equipment
CN111667056A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Method and apparatus for searching model structure
CN111783950A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Model obtaining method, device, equipment and storage medium based on hyper network
CN111860495A (en) * 2020-06-19 2020-10-30 上海交通大学 Hierarchical network structure searching method and device and readable storage medium
US20200372684A1 (en) * 2019-05-22 2020-11-26 Fujitsu Limited Image coding apparatus, probability model generating apparatus and image compression system
WO2020253127A1 (en) * 2019-06-21 2020-12-24 深圳壹账通智能科技有限公司 Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN112559870A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Multi-model fusion method and device, electronic equipment and storage medium
CN112784961A (en) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 Training method and device for hyper network, electronic equipment and storage medium
CN112801287A (en) * 2021-01-26 2021-05-14 商汤集团有限公司 Neural network performance evaluation method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200372684A1 (en) * 2019-05-22 2020-11-26 Fujitsu Limited Image coding apparatus, probability model generating apparatus and image compression system
WO2020253127A1 (en) * 2019-06-21 2020-12-24 深圳壹账通智能科技有限公司 Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN111340221A (en) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 Method and device for sampling neural network structure
CN111667056A (en) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 Method and apparatus for searching model structure
CN111860495A (en) * 2020-06-19 2020-10-30 上海交通大学 Hierarchical network structure searching method and device and readable storage medium
CN111783950A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Model obtaining method, device, equipment and storage medium based on hyper network
CN111553480A (en) * 2020-07-10 2020-08-18 腾讯科技(深圳)有限公司 Neural network searching method and device, computer readable medium and electronic equipment
CN112559870A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Multi-model fusion method and device, electronic equipment and storage medium
CN112784961A (en) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 Training method and device for hyper network, electronic equipment and storage medium
CN112801287A (en) * 2021-01-26 2021-05-14 商汤集团有限公司 Neural network performance evaluation method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHRIS ZHANG 等: "Graph HyperNetworks for Neural Architecture Search", ARXIV, 18 December 2020 (2020-12-18) *
王进;刘彬;张军;陈乔松;邓欣;: "用于微阵列数据分类的子空间融合演化超网络", 电子学报, no. 10, 15 October 2016 (2016-10-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115795314A (en) * 2023-02-07 2023-03-14 山东海量信息技术研究院 Key sample sampling method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113657465B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN112487173B (en) Man-machine conversation method, device and storage medium
CN110795569A (en) Method, device and equipment for generating vector representation of knowledge graph
CN113343803A (en) Model training method, device, equipment and storage medium
JP7414907B2 (en) Pre-trained model determination method, determination device, electronic equipment, and storage medium
CN113627536B (en) Model training, video classification method, device, equipment and storage medium
CN113657467B (en) Model pre-training method and device, electronic equipment and storage medium
CN112560985A (en) Neural network searching method and device and electronic equipment
CN113538235A (en) Training method and device of image processing model, electronic equipment and storage medium
CN112949818A (en) Model distillation method, device, equipment and storage medium
CN112949433B (en) Method, device and equipment for generating video classification model and storage medium
CN113965313A (en) Model training method, device, equipment and storage medium based on homomorphic encryption
CN114020950A (en) Training method, device and equipment of image retrieval model and storage medium
CN114186681A (en) Method, apparatus and computer program product for generating model clusters
CN113641829A (en) Method and device for training neural network of graph and complementing knowledge graph
CN113657468A (en) Pre-training model generation method and device, electronic equipment and storage medium
CN113657466B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN116452861A (en) Target model training method and device and electronic equipment
CN113361621B (en) Method and device for training model
CN115457365A (en) Model interpretation method and device, electronic equipment and storage medium
CN115081630A (en) Training method of multi-task model, information recommendation method, device and equipment
CN114119972A (en) Model acquisition and object processing method and device, electronic equipment and storage medium
CN113361574A (en) Training method and device of data processing model, electronic equipment and storage medium
CN113792876A (en) Backbone network generation method, device, equipment and storage medium
CN115131709B (en) Video category prediction method, training method and device for video category prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination