CN112580803B - Model acquisition method, apparatus, electronic device, storage medium, and program product - Google Patents

Model acquisition method, apparatus, electronic device, storage medium, and program product Download PDF

Info

Publication number
CN112580803B
CN112580803B CN202011509351.7A CN202011509351A CN112580803B CN 112580803 B CN112580803 B CN 112580803B CN 202011509351 A CN202011509351 A CN 202011509351A CN 112580803 B CN112580803 B CN 112580803B
Authority
CN
China
Prior art keywords
target
network
super
connection
intermediate node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011509351.7A
Other languages
Chinese (zh)
Other versions
CN112580803A (en
Inventor
希滕
张刚
温圣召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011509351.7A priority Critical patent/CN112580803B/en
Publication of CN112580803A publication Critical patent/CN112580803A/en
Application granted granted Critical
Publication of CN112580803B publication Critical patent/CN112580803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The disclosure discloses a model acquisition method, a model acquisition device, an electronic device, a storage medium and a program product, and relates to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: obtaining M first soft labels output by a reference model, wherein the M first soft labels respectively correspond to M target connection layers of the reference model one by one; updating the connection parameters of the intermediate nodes of the sub-networks in the super network according to the M first soft labels to obtain a target model; the ith intermediate node of the sub-network is updated based on a first software tag corresponding to an ith target connection layer, and the ith intermediate node is located in an ith connection layer in M connection layers included in the super-network. The method and the device can improve the consistency of the performance between the updated target model and the super network.

Description

Model acquisition method, apparatus, electronic device, storage medium, and program product
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the field of artificial intelligence such as computer vision and deep learning techniques.
Background
With the continuous development of deep learning, the deep learning has been greatly successful in various fields, and gradually developed to full-automatic machine learning. For example, the neural network structure search technology (Neural Architecture Search, NAS) is used as one of research hotspots of full-automatic machine learning, and by designing an efficient search method, the neural network with strong generalization capability and friendly hardware requirements is automatically obtained, so that the creativity of related researchers is greatly relieved.
Conventional NAS methods require independent sampling and evaluation of the performance of the model structure, which can result in significant performance overhead. To reduce performance overhead, gradient-based hyper-networking training methods were investigated. Wherein the super network may be adapted for a variety of different network architecture applications. According to the gradient-based super-network training method, in the super-network training process, the connection with the lowest weight is gradually deleted, and along with the gradual deletion of the connection, the search space is gradually reduced, and finally the connection is converged into an optimal structure.
Disclosure of Invention
The present disclosure provides a model acquisition method, apparatus, electronic device, storage medium, and program product.
According to an aspect of the present disclosure, there is provided a model acquisition method including:
obtaining M first soft labels output by a reference model, wherein the M first soft labels respectively correspond to M target connection layers of the reference model one by one;
updating the connection parameters of the intermediate nodes of the sub-networks in the super network according to the M first soft labels to obtain a target model;
the ith intermediate node of the sub-network is updated based on a first software tag corresponding to an ith target connection layer, and the ith intermediate node is located in an ith connection layer in M connection layers included in the super-network.
According to another aspect of the present disclosure, there is provided a model acquisition apparatus including:
the acquisition module is used for acquiring M first soft labels output by the reference model, wherein the M first soft labels are respectively in one-to-one correspondence with M target connection layers of the reference model;
the updating module is used for updating the connection parameters of the intermediate nodes of the sub-networks in the super network according to the M first soft labels to obtain a target model;
the ith intermediate node of the sub-network is updated based on a first software tag corresponding to an ith target connection layer, and the ith intermediate node is located in an ith connection layer in M connection layers included in the super-network.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model acquisition methods provided by the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the model acquisition method provided by the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the model acquisition method provided by the present disclosure.
According to the technical scheme, the super network is updated based on the first soft label in the reference model, so that the consistency of the performance between the updated target model and the super network can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a model acquisition method provided by the present disclosure;
FIG. 2 is a schematic diagram of a portion of a network architecture in a super network provided by the present disclosure;
FIG. 3 is a schematic diagram of a portion of a network architecture in a target model provided by the present disclosure;
FIG. 4 is one of the block diagrams of a model acquisition apparatus provided by the present disclosure;
FIG. 5 is a second block diagram of a model acquisition apparatus provided by the present disclosure;
fig. 6 is a schematic block diagram of an electronic device provided by the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In recent years, deep learning technology has been greatly successful in many directions, and in the deep learning technology, the effect of a target model is very important influenced by the quality of a neural network structure. Artificially designing neural network structures requires very extensive experience and numerous attempts, and numerous parameters create explosive combinations, and conventional random searches are nearly impossible, so NAS is a research hotspot.
Conventional NAS methods require independent sampling and evaluation of the performance of the model structure, which can result in significant performance overhead. In order to reduce performance cost, the model training method based on the super network greatly accelerates the searching process of the model structure in a parameter sharing mode. However, the consistency problem is the biggest problem of all model training schemes based on the super network, and if the consistency problem is not solved, the search result and the expected result have very large performance difference. Wherein, the uniformity problem specifically is: when the target model obtained by the super-network-based training method is applied to a specific scene, the target model often cannot reach the performance of an independent network structure corresponding to the scene, that is, the performance difference exists between the target model obtained by training and the super network, that is, the performance of the target model obtained by the current super-network-based training method is poor.
The model training scheme based on the super network includes a gradient-based super network training scheme and a one-step (oneshot) -based super network training scheme. The embodiment aims to solve the problem of consistency of the gradient-based super-network training scheme.
At present, in the super network training process, the connection mode with the lowest weight is gradually deleted, and the search space is gradually reduced along with the gradual deletion of the connection mode, so that the super network training process is finally converged into an optimal structure.
However, the above scheme cannot ensure the influence of the connection mode with the lowest deletion weight on the overall performance of the super network, so that the difference between the super network performance and the network structure obtained by independent training is caused; in addition, the performance of the super network is not optimal due to the fact that the difference of the connection modes is deleted.
Referring to fig. 1, fig. 1 is a model acquisition method provided in the present disclosure, including:
step S101, M first soft labels output by a reference model are obtained, wherein the M first soft labels are respectively in one-to-one correspondence with M target connection layers of the reference model.
Specifically, the reference model may be used as a teacher model and the super network may be used as a student model. The teacher model may be a large model for encoding and decoding image/video data, and may be embodied as different types of models according to different actual requirements, for example, a convolutional neural network, a deep neural network, a long-short-term memory network, a generating countermeasure network, and the like. It should be noted that the teacher model should be a network model structure that is trained in advance and has good performance, so that the student model may be trained based on the teacher model, for example, the student model may be distilled based on the teacher model, so as to improve consistency between the trained target model structure and the super network.
The above-mentioned target connection layer may be an intermediate connection layer of the reference model, i.e. a connection layer other than the output layer and the output layer, for example, the target connection layer may be a convolution layer in the reference model.
The first soft tag may be a reference model, and may represent a high-precision reason, that is, the "knowledge" possessed by the teacher model may have experience abstracted into a text expression, by way of example, but for a computer model, the knowledge is usually represented as key feature data. Since each intermediate connection layer in the reference model can process the data output by the upper layer based on the "knowledge" owned by itself, each target connection layer in the reference model has a specific "knowledge", and in this embodiment, the M first soft tag distributions represent the "knowledge" owned by M different target connection layers in different reference models. Specifically, the first soft tag may be obtained based on the following: after training the test model to convergence, an augilury-like head may be led out for each target connection layer (block) in the test model, respectively, until convergence, after which each block may output a soft label. The expression form of the first soft tag may be a set of data formed by combining a plurality of feature data, or may be a feature distribution for characterizing the distribution of key feature data.
It should be noted that the above-mentioned M target connection layers are different in the level of the test model, for example, the M target connection layers may be connection layers between an input layer and an output layer of the test model and connected in sequence.
Step S102, updating connection parameters of intermediate nodes of a sub-network in a super-network according to the M first soft labels to obtain a target model, wherein an ith intermediate node of the sub-network is updated based on a first software label corresponding to an ith target connection layer, and the ith intermediate node is located in an ith connection layer in M connection layers included in the super-network.
The above updating of the connection parameters of the intermediate nodes of the sub-networks in the super-network may refer to: in the training process of training the super network to obtain the sub-network, the connection parameters of the intermediate nodes of the sub-network in the super network are updated. The parameter updating mode may be to delete the connection between the nodes from the super network to reduce the complexity of the super network, and the target model may be a sub-network trained in the process.
Referring to fig. 2, a schematic diagram of a part of a network structure in a super network according to an embodiment of the present disclosure is provided, where 0, 1, 2, and 3 may respectively represent intermediate nodes in four different connection layers of the super network, and for convenience of explanation, the connection layer where the intermediate node 0 is located is referred to as a first connection layer, the connection layer where the intermediate node 1 is located is referred to as a second connection layer, the connection layer where the intermediate node 2 is located is referred to as a third connection layer, and the connection layer where the intermediate node 3 is located is referred to as a fourth connection layer. The different connections between intermediate nodes in different connection layers constitute different sub-networks, and the sub-networks of the super-network may comprise connection nodes in each connection layer of the super-network, or may comprise connection nodes in only part of the connection layers of the super-network. For example, as can be seen from fig. 2, the intermediate node 0 in the first connection layer may be directly connected to the intermediate node 3 in the fourth connection layer to form one of the subnetworks of the super network, and furthermore, the intermediate node 0 in the first connection layer may be connected to the intermediate node 3 in the fourth connection layer through the intermediate node 2 in the second connection layer and the intermediate node 3 in the third connection layer, respectively, to form one of the subnetworks of the super network. Referring to fig. 3, the subnetwork is a directed acyclic graph composed of an ordered sequence of nodes.
Referring to fig. 2 further, it can be seen from fig. 2 that there are multiple different connections between different nodes in the super network, and in this embodiment, the connections in the super network may be deleted based on the guidance of the first soft label output by the reference model, so as to obtain the target model as shown in fig. 3, and the performance between the target model and the super network is consistent, and meanwhile, the obtained sub network can directly output the model without retraining.
The above-mentioned super network may include M intermediate connection layers, wherein the connection layers in the super network may be convolution layers. The M target connection layers of the reference model and the M connection layers of the super network may be in a one-to-one correspondence relationship, for example, when the number of connection layers of the reference model and the super network is the same, an ith connection layer in the super network may correspond to an ith target connection layer of the reference model, and the value range of i is 1 to M.
Specifically, the corresponding relationship between the M first soft labels and the M connection layers of the super network may be established based on the M target connection layers of the reference model, for example, the corresponding relationship between the first soft label corresponding to the ith target connection layer and the connection layer corresponding to the ith target connection layer may be established, so that when the connection parameters of the intermediate node of the sub network are updated, the first soft label corresponding to the connection layer where the intermediate node is located may be used as a guide to guide the process of updating the connection parameters of the intermediate node. Therefore, step-by-step supervision can be realized in the process of updating the super network, so that the consistency of the performance between the target model obtained after updating and the super network is improved.
As shown in fig. 3, in the above object model, the number of connections between each node may be 1.
It should be noted that, when the method provided in the above embodiment is applied to the aspect of image processing, the speed of image processing can be improved, and at the same time, the equipment cost can be reduced due to the improvement of task search efficiency.
In the embodiment, the updating process of the M connection layers of the super network is respectively guided based on the M first soft labels in the reference model so as to realize step-by-step supervision, so that the consistency of the performance between the target model obtained after updating and the super network can be improved.
Optionally, each intermediate node in the subnetwork has k+1 connections, where K is an integer greater than 1, and updating, according to the M first soft labels, connection parameters of the intermediate nodes of the subnetwork in the super network includes:
and carrying out K rounds of iterative updating on the sub-network, wherein each round of iterative updating deletes one connection of each intermediate node.
The presence of k+1 connections at the intermediate node may be referred to as: in the sub-model, there are k+1 connections between the intermediate node and the node at the next stage of the intermediate node, for example, please refer to fig. 2, and in the sub-network formed by the intermediate node 0, the intermediate node 1 and the intermediate node 3, there are 3 connections between the intermediate node 0 and the intermediate node 1, that is, there are 3 connections between the intermediate node 0.
The X-th round of updating in the K rounds may be an updating performed on the basis of a first super network, where X is an integer greater than 0, and in the case where X is equal to 1, the first super network is the super network. And under the condition that X is not equal to 1, the first super network is the super network obtained after the X-1 round of updating is carried out on the super network.
Since each intermediate node in the sub-network has k+1 connections, the number of connections between each node in the obtained target model may be 1 by performing the clipping operation on the intermediate node K times.
In the case that the above-mentioned super network includes a plurality of sub-networks, the plurality of sub-networks of the super network may be updated according to a preset order, respectively, to obtain the object model shown in fig. 3. Specifically, when the a-th sub-model of the super network is updated, updating may be performed on the basis of a second super network, where a is an integer greater than 0, and in the case where a is equal to 1, the second super network is the above super network. And under the condition that a is not equal to 1, the second super network is the super network obtained after the a-1 th sub-model of the super network is updated.
Furthermore, since there are common nodes between different sub-models in the super-network, for example, please refer to fig. 2, there are common node 2 and common node 3 between sub-networks 0, 1, 2, 3 and sub-networks 0, 2, 3. In the process of updating the sub-networks 0, 1, 2 and 3, the connection of the public node 2 is updated, that is, the number of the connections of the public node 2 is 1, and subsequently, when the sub-models 0, 2 and 3 are updated, the connection parameters of the node 2 do not need to be updated. Thus, the above K-round iterative update of the sub-network may refer to: and carrying out iterative updating on the sub-network until the connection quantity of all intermediate nodes in the sub-network is 1.
It should be noted that, when an update is performed on a certain intermediate node, the connection parameters of other intermediate nodes in the super network should be frozen to avoid that the connection parameters of other intermediate nodes in the super network change during the update.
In this embodiment, in the process of updating the connection parameters of the intermediate nodes of the sub-network, only one connection in the intermediate nodes is deleted at a time, so that the connection with the least influence on the overall performance of the super-network can be deleted based on the current state of the super-network each time the super-network is updated. Compared with the prior art, the method and the device can avoid deleting the connections with larger influence on the performance of the super network due to the fact that the plurality of connections are deleted at one time.
Optionally, the super network includes an output node, and the j-th round of updating in the K-round of iterative updating includes:
and deleting the connection of the intermediate nodes in the sub-network according to a preset sequence, wherein the preset sequence is a sequence obtained by sequencing from small to large according to the distance between the intermediate nodes in the sub-network and the output node.
Specifically, in the process of processing data, for example, in the process of identifying an image, the model is generally transmitted step by step according to the node in the model, that is, the output of the previous node is taken as the input of the next node, so that after updating the connection parameter of a certain intermediate node, the output of the next node of the intermediate node is affected.
Because each round of updating needs to delete one connection in each intermediate node of the sub-model, in order to avoid the influence on the connection deletion process of the intermediate node at the lower layer caused by the deletion of the connection in the intermediate node at the upper layer, the problem of deleting the connection with larger influence on the performance of the super-network is caused. In this embodiment, during the j-th round of updating in the K-round iterative updating, each intermediate node in the sub-model is updated gradually from the intermediate node close to the output layer to the direction far from the output layer, for example, during the updating of the sub-networks 0, 1, 2, 3 in fig. 2, four intermediate nodes of the sub-networks 0, 1, 2, 3 may be updated in the following order: intermediate node 3, intermediate node 2, intermediate node 1, intermediate node 0. In this way, when the first intermediate node in the sub-network is updated, since none of the upper intermediate nodes of the first intermediate node is updated, the update process of the first intermediate node is not affected due to the update of the upper intermediate node, and meanwhile, since all of the lower intermediate nodes of the first intermediate node are updated, the update process of the first intermediate node is not affected, wherein the first intermediate node is any intermediate node in the sub-network.
In this embodiment, the update of each intermediate node in the sub-network is performed according to the preset sequence, so that the influence on the update process of the intermediate node at the lower layer caused by the update of the intermediate node at the upper layer can be avoided, so that the accuracy of the update process is further improved, and the consistency between the target model obtained after the update and the super-network is further improved.
Optionally, deleting the connection modes of the intermediate nodes in the sub-network according to the preset sequence includes:
performing K+1 deleting operations on a target intermediate node to obtain K+1 intermediate super networks, wherein each deleting operation deletes one different connection of the target intermediate node, the target intermediate node is any intermediate node in the sub-network, the target intermediate node is positioned at a y-th connecting layer in the M connecting layers, and y is any integer from 1 to M;
determining K+1 second soft labels output by a y-th connecting layer in the K+1 intermediate super networks;
and determining the target super network in the K+1 intermediate super networks as an updated super network based on the distance between the K+1 second soft labels and the target first soft labels corresponding to the y-th target connection layer.
In particular, the updating of the target intermediate node may be performed on the basis of a third supernetwork. If j is equal to 1 and the target intermediate node is the intermediate node closest to the output node of the sub-network, the third super-network is the super-network, i.e. the third super-network is the super-network which is not updated; if j is not equal to 1 and the target intermediate node is the intermediate node closest to the output node in the sub-network, the third super-network is the super-network obtained after the j-1 th round of updating of the super-network is performed; and if the target intermediate node is not the intermediate node closest to the output node in the sub-network, the third super-network is the super-network obtained after the intermediate node in the previous position of the target intermediate node is updated in the preset sequence.
The k+1 deleting operations are all performed on the basis of the third super network, and since the target intermediate node has k+1 connections, the k+1 deleting operations are respectively corresponding to the k+1 connections existing in the target intermediate node one by one, that is, the k+1 deleting operations are respectively performed in the third super network to delete different connections in the k+1 connections existing in the target intermediate node, so that by comparing the k+1 intermediate super networks, it can be determined which connection in the k+1 connections deleting the target intermediate node has the smallest performance loss of the target model obtained after updating, and further it is determined that the target connection in the k+1 connections of the target intermediate node should be deleted in the current round of updating process, and the intermediate super network deleting the target connection is determined as the super network obtained after updating the target intermediate node.
Since the target intermediate node is located at the y-th connection layer of the M connection layers, in order to determine the target connection, it may be determined that the y-th connection layer of each of the k+1 intermediate super networks outputs a second soft tag, so as to obtain k+1 second soft tags, where the second soft tag is similar to the first soft tag, and a specific physical meaning, that is, an acquisition manner, may participate in the description related to the first soft tag.
Since the k+1 deleting operations are all performed under the guidance of the target first soft tag corresponding to the y-th target connection layer, the k+1 second soft tags can be respectively compared with the target first soft tag to determine the target connection in the k+1 connections. Specifically, after deleting a certain connection in the k+1 connections, if the obtained second soft tag output by the intermediate super-network is closer to the target first soft tag, that is, the performance of the super-network before and after deletion is equivalent, it is indicated that the influence of deleting the connection on the performance of the super-network is smaller. Otherwise, if the difference between the second soft label output by the intermediate super-network and the target first soft label is larger after deleting a certain connection, the influence of deleting the connection on the performance of the super-network is larger.
The distance may be a norm distance or a euclidean distance between the soft tags, and the distance is not limited to the norm distance or the euclidean distance, and may be any physical quantity that can represent the difference between the soft tags.
In this embodiment, each time the target intermediate node is updated, one connection with the least influence on the performance of the super network in the target intermediate node is deleted, so as to ensure that the performance loss is the least after the connection is deleted, thereby improving the consistency of the performance between the target model obtained after the update and the super network.
Optionally, the target super network is a super network with the minimum distance between the second soft tag output in the k+1 intermediate super networks and the first soft tag of the target.
Specifically, if the target super-network is that the distance between the second soft tag output by the k+1 intermediate super-networks and the target first soft tag is the smallest, it is indicated that the performance between the intermediate super-network obtained after the connection is deleted and the third super-network is closer.
In this embodiment, the target super-network is determined to be the super-network after updating by setting the target super-network to be the super-network in which the distance between the second soft tag outputted from the k+1 intermediate super-networks and the target first soft tag is the smallest. In this way, the performance penalty of the super network after deleting the target connection in the intermediate node can be minimized.
Optionally, the number of connection layers included in the reference model is an integer multiple of M, and the number of connection layers spaced between any two target connection layers in the M target connection layers is the same.
Specifically, if the number of connection layers of the reference model and the super network is 4, determining the 4 connection layers of the reference model as target connection layers, where the one-to-one correspondence between the M target connection layers and the M connection layers may refer to: according to the sequence of data transmission in the model, the 1 st target connection layer of the reference model corresponds to the 1 st connection layer of the super network, the 2 nd target connection layer of the reference model corresponds to the 2 nd connection layer of the super network, the 3 rd target connection layer of the reference model corresponds to the 3 rd connection layer of the super network, and the 4 th target connection layer of the reference model corresponds to the 4 th connection layer of the super network. For another example, if the number of connection layers of the reference model is 8 and the number of connection layers of the super network is 4, the 2 nd connection layer, the 4 th connection layer, the 6 th connection layer and the 8 th connection layer of the reference model may be respectively determined as target connection layers, and in this case, the one-to-one correspondence between the M target connection layers and the M connection layers may be: according to the sequence of data transmission in the model, the 2 nd target connection layer of the reference model corresponds to the 1 st connection layer of the super network, the 4 th target connection layer of the reference model corresponds to the 2 nd connection layer of the super network, the 6 th target connection layer of the reference model corresponds to the 3 rd connection layer of the super network, and the 8 th target connection layer of the reference model corresponds to the 4 th connection layer of the super network.
In this embodiment, the number of connection layers at intervals between any two of the M target connection layers is the same, so that the performance difference between two adjacent target connection layers is relatively balanced, so that the distances between the obtained M target soft labels are relatively balanced, and the stability of performance between each level of the target model obtained by training is improved when the update process of the super network is guided based on the M target soft labels, thereby further improving the performance of the target model obtained by training.
Referring to fig. 4, fig. 4 is a model obtaining apparatus 400 provided in an embodiment of the disclosure, including:
the obtaining module 401 is configured to obtain M first soft labels output by a reference model, where the M first soft labels respectively correspond to M target connection layers of the reference model one by one;
an updating module 402, configured to update connection parameters of intermediate nodes of a sub-network in the super-network according to the M first soft labels, to obtain a target model;
the ith intermediate node of the sub-network is updated based on a first software tag corresponding to an ith target connection layer, and the ith intermediate node is located in an ith connection layer in M connection layers included in the super-network.
Optionally, each intermediate node in the subnetwork has k+1 connections, where K is an integer greater than 1;
the updating module 402 is specifically configured to perform K-round iterative updating on the subnetwork, where each round of iterative updating deletes one connection of each intermediate node.
Optionally, the updating module 402 is further specifically configured to delete the connection of the intermediate nodes in the sub-network according to a preset order, where the preset order is an order obtained by sorting from small to large according to a distance between the intermediate nodes in the sub-network and the output node.
Optionally, the updating module 402 includes:
a deleting unit 4021, configured to perform a k+1 deletion operation on a target intermediate node to obtain k+1 intermediate super networks, where each deletion operation deletes a different connection of the target intermediate node, the target intermediate node is any intermediate node in the sub-network, and the target intermediate node is located in a y-th connection layer of the M connection layers, and y is any integer from 1 to M;
a determining unit 4022, configured to determine k+1 second soft labels output by a y-th connection layer in the k+1 intermediate super networks;
and an updating unit 402, configured to determine, based on the distance between the k+1 second soft labels and the target first soft labels corresponding to the y-th target connection layer, a target super network in the k+1 intermediate super networks as an updated super network.
Optionally, the target super network is a super network with the minimum distance between the second soft tag output in the k+1 intermediate super networks and the first soft tag of the target.
Optionally, the number of connection layers included in the reference model is an integer multiple of M, and the number of connection layers spaced between any two target connection layers in the M target connection layers is the same.
The device provided in this embodiment can implement each process implemented in the method embodiment shown in fig. 1, and can achieve the same beneficial effects, so that repetition is avoided, and no further description is given here.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as a model acquisition method. For example, in some embodiments, the model acquisition method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the model acquisition method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the model acquisition method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. An image processing model acquisition method, comprising:
obtaining M first soft labels output by a reference model, wherein the M first soft labels respectively correspond to M target connection layers of the reference model one by one; the target connection layer is a convolution layer in the reference model;
updating the connection parameters of the intermediate nodes of the sub-networks in the super network according to the M first soft labels to obtain a target model;
the ith intermediate node of the sub-network is updated based on a first soft label corresponding to an ith target connection layer, and the ith intermediate node is located in an ith connection layer in M connection layers included in the super-network;
the reference model is used as a teacher model, the super network is used as a student model, and the teacher model is a large model for encoding and decoding image data;
k+1 connections exist for each intermediate node in the subnetwork, K is an integer greater than 1, and the updating of the connection parameters of the intermediate nodes of the subnetwork in the super-network according to the M first soft labels includes:
k rounds of iterative updating are carried out on the sub-network, wherein each round of iterative updating deletes one connection of each intermediate node;
the super network comprises an output node, and the j-th round of updating in the K round of iterative updating comprises the following steps:
deleting the connection of the intermediate nodes in the sub-network according to a preset sequence;
the deleting the connection mode of the intermediate nodes in the sub-network according to the preset sequence comprises the following steps:
performing K+1 deleting operations on a target intermediate node to obtain K+1 intermediate super networks, wherein each deleting operation deletes one different connection of the target intermediate node, the target intermediate node is any intermediate node in the sub-network, the target intermediate node is positioned at a y-th connecting layer in the M connecting layers, and y is any integer from 1 to M;
determining K+1 second soft labels output by a y-th connecting layer in the K+1 intermediate super networks;
and determining the target super network in the K+1 intermediate super networks as an updated super network based on the distance between the K+1 second soft labels and the target first soft labels corresponding to the y-th target connection layer.
2. The method of claim 1, wherein the preset order is an order ordered from small to large in terms of a distance between an intermediate node in the subnetwork and the output node.
3. The method of claim 1, wherein the target super network is a super network in which a distance between a second soft tag output in the k+1 intermediate super networks and the target first soft tag is smallest.
4. The method of claim 1, wherein the reference model includes a number of connection layers that is an integer multiple of the M, and a number of connection layers spaced between any two of the M target connection layers is the same.
5. An image processing model acquisition apparatus comprising:
the acquisition module is used for acquiring M first soft labels output by the reference model, wherein the M first soft labels are respectively in one-to-one correspondence with M target connection layers of the reference model; the target connection layer is a convolution layer in the reference model;
the updating module is used for updating the connection parameters of the intermediate nodes of the sub-networks in the super network according to the M first soft labels to obtain a target model;
the ith intermediate node of the sub-network is updated based on a first soft label corresponding to an ith target connection layer, and the ith intermediate node is located in an ith connection layer in M connection layers included in the super-network;
the reference model is used as a teacher model, the super network is used as a student model, and the teacher model is a large model for encoding and decoding image data;
k+1 connections exist for each intermediate node in the sub-network, wherein K is an integer greater than 1;
the updating module is specifically configured to perform K-round iterative updating on the subnetwork, where each round of iterative updating deletes one connection of each intermediate node;
the updating module is specifically configured to delete connections of intermediate nodes in the subnetwork according to a preset sequence;
the updating module comprises:
a deleting unit, configured to perform k+1 deleting operations on a target intermediate node to obtain k+1 intermediate supernetworks, where each deleting operation deletes a different connection of the target intermediate node, the target intermediate node is any intermediate node in the subnetwork, and the target intermediate node is located in a y-th connection layer of the M connection layers, and y is any integer from 1 to M;
a determining unit, configured to determine k+1 second soft tags output by a y-th connection layer in the k+1 intermediate super networks;
and the updating unit is used for determining the target super network in the K+1 intermediate super networks as an updated super network based on the distance between the K+1 second soft labels and the target first soft labels corresponding to the y-th target connection layer.
6. The apparatus of claim 5, wherein the predetermined order is an order ordered from small to large by a distance between an intermediate node and an output node in the subnetwork.
7. The apparatus of claim 5, wherein the target super network is a super network in which a distance between a second soft tag output in the k+1 intermediate super networks and the target first soft tag is smallest.
8. The apparatus of claim 5, wherein the reference model includes a number of connection layers that is an integer multiple of the M, and a number of connection layers spaced between any two of the M target connection layers is the same.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 4.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 4.
CN202011509351.7A 2020-12-18 2020-12-18 Model acquisition method, apparatus, electronic device, storage medium, and program product Active CN112580803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011509351.7A CN112580803B (en) 2020-12-18 2020-12-18 Model acquisition method, apparatus, electronic device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011509351.7A CN112580803B (en) 2020-12-18 2020-12-18 Model acquisition method, apparatus, electronic device, storage medium, and program product

Publications (2)

Publication Number Publication Date
CN112580803A CN112580803A (en) 2021-03-30
CN112580803B true CN112580803B (en) 2024-01-09

Family

ID=75136168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011509351.7A Active CN112580803B (en) 2020-12-18 2020-12-18 Model acquisition method, apparatus, electronic device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN112580803B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523664A (en) * 2020-04-23 2020-08-11 北京百度网讯科技有限公司 Super-network parameter updating method and device and electronic equipment
CN111582454A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Method and device for generating neural network model
CN111783950A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Model obtaining method, device, equipment and storage medium based on hyper network
CN111860495A (en) * 2020-06-19 2020-10-30 上海交通大学 Hierarchical network structure searching method and device and readable storage medium
RU2735572C1 (en) * 2019-06-06 2020-11-03 Бейджин Сяоми Интеллиджент Текнолоджи Ко., Лтд. Method and device for training super network
CN111967591A (en) * 2020-06-29 2020-11-20 北京百度网讯科技有限公司 Neural network automatic pruning method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105029B (en) * 2018-10-29 2024-04-16 北京地平线机器人技术研发有限公司 Neural network generation method, generation device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2735572C1 (en) * 2019-06-06 2020-11-03 Бейджин Сяоми Интеллиджент Текнолоджи Ко., Лтд. Method and device for training super network
CN111523664A (en) * 2020-04-23 2020-08-11 北京百度网讯科技有限公司 Super-network parameter updating method and device and electronic equipment
CN111582454A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Method and device for generating neural network model
CN111860495A (en) * 2020-06-19 2020-10-30 上海交通大学 Hierarchical network structure searching method and device and readable storage medium
CN111783950A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Model obtaining method, device, equipment and storage medium based on hyper network
CN111967591A (en) * 2020-06-29 2020-11-20 北京百度网讯科技有限公司 Neural network automatic pruning method and device and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Sample-Efficient Neural Architecture Search by Learning Action Space";Linnan Wang 等;《arXiv》;全文 *
基于神经网络结构搜索的目标识别方法;卞伟伟;邱旭阳;申研;;空军工程大学学报(自然科学版)(04);全文 *
高效深度神经网络综述;闵锐;;电信科学(04);全文 *

Also Published As

Publication number Publication date
CN112580803A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112597754B (en) Text error correction method, apparatus, electronic device and readable storage medium
CN111582454B (en) Method and device for generating neural network model
CN111241838B (en) Semantic relation processing method, device and equipment for text entity
CN114357105B (en) Pre-training method and model fine-tuning method of geographic pre-training model
CN112580733B (en) Classification model training method, device, equipment and storage medium
CN115860128B (en) Quantum circuit operation method and device and electronic equipment
CN111652354B (en) Method, apparatus, device and storage medium for training super network
CN111783950A (en) Model obtaining method, device, equipment and storage medium based on hyper network
US20230162041A1 (en) Neural network model, method, electronic device, and readable medium
CN114428907A (en) Information searching method and device, electronic equipment and storage medium
CN112580723B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN111783951B (en) Model acquisition method, device, equipment and storage medium based on super network
CN116151384B (en) Quantum circuit processing method and device and electronic equipment
KR20220003444A (en) Optimizer learning method and apparatus, electronic device and readable storage medium
CN116151381B (en) Quantum circuit processing method and device and electronic equipment
CN115809688B (en) Model debugging method and device, electronic equipment and storage medium
CN112580803B (en) Model acquisition method, apparatus, electronic device, storage medium, and program product
CN113691403B (en) Topology node configuration method, related device and computer program product
CN111177479A (en) Method and device for acquiring feature vectors of nodes in relational network graph
CN113127357B (en) Unit test method, apparatus, device, storage medium, and program product
CN111160552B (en) News information recommendation processing method, device, equipment and computer storage medium
CN111539225B (en) Searching method and device for semantic understanding framework structure
CN113868254A (en) Method, device and storage medium for removing duplication of entity node in graph database
CN112784962A (en) Training method and device for hyper network, electronic equipment and storage medium
CN116611527B (en) Quantum circuit processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant