WO2018227800A1 - Procédé et dispositif d'apprentissage de réseau neuronal - Google Patents

Procédé et dispositif d'apprentissage de réseau neuronal Download PDF

Info

Publication number
WO2018227800A1
WO2018227800A1 PCT/CN2017/102032 CN2017102032W WO2018227800A1 WO 2018227800 A1 WO2018227800 A1 WO 2018227800A1 CN 2017102032 W CN2017102032 W CN 2017102032W WO 2018227800 A1 WO2018227800 A1 WO 2018227800A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
output data
network
similarity
output
Prior art date
Application number
PCT/CN2017/102032
Other languages
English (en)
Chinese (zh)
Inventor
王乃岩
陈韫韬
Original Assignee
北京图森未来科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京图森未来科技有限公司 filed Critical 北京图森未来科技有限公司
Publication of WO2018227800A1 publication Critical patent/WO2018227800A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of computer vision, and in particular to a neural network training method and apparatus.
  • model of deep neural network often contains a large number of model parameters, which is computationally intensive and slow in processing, and cannot be calculated in real time on some low-power, low-computing devices (such as embedded devices, integrated devices, etc.).
  • the knowledge of the teacher network ie, the teacher network, the teacher network generally has a complex network structure, high accuracy, and slow calculation speed
  • the student network is transferred to the student network through knowledge migration.
  • student network, student network network structure is relatively simple, low accuracy, fast
  • the student network at this time can be applied to devices with low power consumption and local computing power.
  • Knowledge migration is a general technique for compressing and accelerating deep neural network models.
  • KD Knowledge Distill
  • AT Attention Transfer
  • the existing knowledge migration method uses the information of the single data in the output data of the teacher network to train the student network. Although the trained student network has certain improvement in performance, there is still much room for improvement.
  • knowledge transfer In deep neural networks, knowledge transfer refers to the use of training sample data in the intermediate network layer of the teacher network or the final network layer to help the student network with faster training but poor performance. Migrate a high-performing teacher network to the student network.
  • Knowledge Distill In deep neural networks, knowledge extraction refers to the technique of training student networks by using the smooth category posterior probability output from the teacher network in the classification problem.
  • Teacher Network A high-performance neural network used to provide more accurate monitoring information for student networks during the knowledge migration process.
  • Student Network A fast calculation but poor performance suitable for deployment to a single neural network in a real-time scenario with high real-time requirements.
  • the student network has greater computational throughput than the teacher network. Fewer model parameters.
  • the present invention provides a neural network training method and apparatus to further improve the performance and accuracy of a student network.
  • an aspect provides a neural network training method, where the method includes:
  • the first output data is data output from the first specific network layer of the teacher network after the training sample data is input into the teacher network
  • the second output data is input from the student network after the training sample data is input into the student network.
  • the second specific network layer outputs the data.
  • a neural network training device comprising:
  • the selection unit is used to select a teacher network that implements the same function as the student network;
  • a training unit configured to iteratively train the student network to obtain a target network based on matching the similarity between the data of the first output data corresponding to the same training sample data and the data of the second output data, so as to implement the teacher network
  • the similarity between the output data is migrated to the student network
  • the first output data is data output from the first specific network layer of the teacher network after the training sample data is input into the teacher network
  • the second output data is input from the student network after the training sample data is input into the student network.
  • the second specific network layer outputs the data.
  • a neural network training apparatus comprising: a processor and at least one memory, the at least one memory for storing at least one machine executable instruction, the processor performing the at least An instruction to achieve:
  • the first output data is data output from the first specific network layer of the teacher network after the training sample data is input into the teacher network
  • the second output data is input from the student network after the training sample data is input into the student network.
  • the second specific network layer outputs the data.
  • the sample training data can be completely migrated to the student network in the data of the output data output by the teacher network, thereby realizing the result of the training sample data output through the teacher network and the result output through the target network. Consistent. According to the good generalization performance of the neural network, the output of the target network and the output of the teacher network are basically the same in the test set, thereby improving the accuracy of the student network.
  • FIG. 1 is a flowchart of a neural network training method according to an embodiment of the present invention.
  • FIG. 2 is a flow chart of training a student network in an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a neural network training apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a training unit according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a neural network training apparatus according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a neural network training method according to an embodiment of the present invention, where the method includes:
  • Step 101 Select a teacher network that implements the same function as the student network.
  • the teacher network has excellent performance and high accuracy, but its structure is complex relative to the student network, the parameter weight is more, and the calculation speed is slower.
  • the student network is fast, the performance is generally poor or the network structure is simple.
  • a network with the same functions and excellent performance as that implemented by the student network can be selected as a teacher network in a set of pre-set neural network models.
  • Step 102 Iteratively train the student network to obtain a target network based on matching the similarity between the data of the first output data corresponding to the same training sample data and the data of the second output data to obtain the output of the teacher network. The similarity between data is migrated to the student network.
  • the first output data is data output from the first specific network layer of the teacher network after the training sample data is input into the teacher network
  • the second output data is input from the student network after the training sample data is input into the student network.
  • the second specific network layer outputs the data.
  • the data outputted from the first specific network layer of the teacher network is collectively referred to as the first output data; after the training sample data is input into the student network, the second specificity from the student network is obtained.
  • the data output by the network layer is collectively referred to as second output data.
  • the first specific network layer is an intermediate network layer or a last layer network layer in the teacher network.
  • the second specific network layer is an intermediate network layer or a last layer network layer of the student network.
  • step 102 specifically implements the method flow that can be as shown in FIG. 2, and specifically includes:
  • Step 102A Construct an objective function of the student network, where the objective function includes a matching function of the similarity between the data of the first output data corresponding to the training sample data and the data of the second output data.
  • Step 102B Perform iterative training on the student network by using the training sample data.
  • Step 102C When the number of iterations training reaches a threshold or the target function satisfies a preset convergence condition, the target network is obtained.
  • the specific implementation may be as follows:
  • the training sample data used for the iterative training is referred to as current training sample data, and the iterative training includes the following steps A and B, Step C, Step D, Step E and Step F):
  • Step A input current training sample data for the current iteration training into the teacher network and the student network, respectively, to obtain corresponding first output data and second output data;
  • Step B calculating a similarity between each data in the first output data and calculating a phase between each data in the second output data Similarity
  • Step C Calculate a probability of all the order of the data in the first output data according to the similarity between the data in the first output data, and select a target order from all the order of the data in the first output data.
  • Step D calculating, according to the similarity between each data in the second output data, a probability of a target arrangement order of each data in the second output data;
  • Step E calculating a value of the target function according to a probability of a target arrangement order of each data in the first output data and a probability of a target arrangement order of each data in the second output data, and adjusting according to the value of the target function The weight of the student network;
  • Step F Perform the next iterative training based on the student network after adjusting the weight.
  • the target arrangement order is selected from all the arrangement orders of the data in the first output data, and the implementation manner includes but is not limited to the following two types:
  • the order in which the probability values are greater than the preset threshold is selected from the all sorting orders of the data in the first output data as the target sorting order.
  • the order of the preset number of the probabilities taken in the first output data is selected as the target arrangement order.
  • the order of the selected objects may be one or more, which is not strictly limited in this application.
  • step B calculating the similarity between the data in the first output data (the second output data), specifically: calculating a spatial distance between the two data in the first output data (the second output data), A similarity between the two pairs of data is obtained according to the spatial distance.
  • the spatial distance may be a European distance, a cosine distance, a block distance, or a Mahalanobis distance, and the present application is not strictly limited. Take the example of calculating the Euclidean distance and cosine distance between two pairs of data.
  • is a preset scale transformation factor
  • is a preset contrast expansion factor
  • is an offset
  • 2 represents a l 2 norm of the vector.
  • is a preset scale transformation factor
  • is a preset contrast expansion factor
  • is an offset
  • represents a point multiplication operation between vectors.
  • each data in the first output data is calculated according to the similarity between the data in the first output data.
  • the probability of all the sorting orders in the specific implementation: for each sorting order, the order information of the sorting order and the similarity between all adjacent two data in the sorting order of the first output data are input into a preset probability In the calculation model, the probability of the arrangement order is obtained.
  • the probability of the target arrangement order of each data in the second output data is calculated according to the similarity between the data in the second output data, and the specific implementation is as follows: for each target arrangement order, the target is The order information of the arrangement order and the similarity between all adjacent two data in the target arrangement order of the second output data are input into the probability calculation model, and the probability of the target arrangement order is obtained.
  • the probability calculation model may be a first-order Plackett probability model, a high-order Plackett probability model, or other models capable of calculating a probability, which is not strictly limited.
  • the following is an example of the probability of using the first-order Plackett probability model to calculate the order of arrangement.
  • f( ⁇ ) is any linear or non-linear mapping function, and the sum of the probabilities of all the ordering is 1
  • the target arrangement order may be one or multiple.
  • the objective function of the student network may only include a matching function, and the objective function may also be a sum of a matching function and a task loss function, and the expression of the task loss function is related to a task to be implemented by the student network.
  • the task loss function can be the same as the objective function of the teacher network.
  • the expression of the matching function can be, but is not limited to, the following formula (3) and formula (4).
  • Example 1 When the target order is one, the objective function of the student network can be set as shown in the following formula (3):
  • ⁇ t is the target arrangement order of each data in the first output data corresponding to the current training sample data
  • X s is the second output data corresponding to the current training sample data
  • X s ) is The probability of the target arrangement order of each data in the second output data.
  • the foregoing target arrangement order ⁇ t is an arrangement order in which the probability values are the largest among all the arrangement orders of the data in the first output data of the current training sample data.
  • the embodiment of the present invention may train the student network based on a manner of matching a probability distribution of a plurality of target arrangement orders.
  • there are various methods for matching the probability distribution of a plurality of target arrangement orders such as a total variation distance based on a probability distribution, a Wesserstein distance, a Jensen-Shannon divergence, or a Kullback-Leibler divergence.
  • the objective function expression of the student network may be as follows (4):
  • is a target arrangement order
  • X s is the second output data corresponding to the current training sample data
  • X t is the first output data corresponding to the current training sample data
  • X s ) is the current The probability of ⁇ of each data in the second transmission data of the training sample data
  • X t ) is the probability of ⁇ of each data in the first transmission data of the current training sample data
  • Q is a set of the target arrangement order.
  • adjusting the weight of the student network according to the value of the objective function in the foregoing step E includes: adopting a preset gradient descent optimization algorithm, and adjusting the weight of the student network according to the value of the target function.
  • the foregoing steps A and B further include the following steps: processing the first output data and the second output data by using a downsampling algorithm and an interpolation algorithm, so that a spatial dimension of the first output data and a The spatial dimensions of the two output data are consistent, and the number of the first output data and the number of the second output data are both consistent with the number of the current training sample data.
  • the step A is not needed. This step is added to step B, that is, step B is directly executed after step A.
  • the aforementioned spatial dimension generally refers to the number of inputs The number of channels, the number of channels, the height and width of the feature map.
  • steps A to F have no strict sequence, and the following steps A' to B' may be used instead of the above steps A to B.
  • Step A' inputting current training sample data for the iterative training into the teacher network, obtaining corresponding first output data, and calculating a similarity between the data in the first output data;
  • Step B' inputting the current training sample data into the student network, obtaining corresponding second output data, and calculating a similarity between the data in the second output data.
  • the first output data of the three training sample data input to the teacher network is The second output data of the three training sample data input to the student network output is
  • all the order of the data in the first output data is used as the target arrangement order.
  • a set of target arrangement orders of the first output data corresponding to the i-th training sample data among them The probability of calculating the target arrangement order of the first output data corresponding to the i-th training data is
  • the order in which the data in the first output data and the second output data are arranged in the same order is used as the same target arrangement order.
  • the second output data of the i-th training sample data With its first output data
  • the first iteration training input y 1 into the teacher network and the student network, and obtain the corresponding first output data as And the second output data is Calculation Similarity and calculation between data in Similarity between the data in the data; Similarity calculation between data in The probability of all the order of the data in each order, the order of all the sorting is used as the target sorting order; The similarity between the data is calculated The probability of the target arrangement order of each of the data; the probability of the target arrangement order of each data in the first output data corresponding to y 1 and the probability of the target arrangement order of each data in the second output data are input to the objective function, and the calculation is performed The value of the objective function is L 1 , and the current weight W 0 of the student network is adjusted according to the L 1 , and the adjusted weight W 1 is obtained ;
  • the second iteration training input y 2 into the teacher network and the student network, and obtain the corresponding first output data as And the second output data is Calculation Similarity and calculation between data in Similarity between the data in the data; Similarity calculation between data in The probability of all the order of the data in each order, the order of all the sorting is used as the target sorting order; The similarity between the data is calculated The probability of the target arrangement order of each data in the data; the probability of the target arrangement order of each data in the first output data corresponding to y 2 and the probability of the target arrangement order of each data in the second output data are input into the objective function, and the calculation is performed The value of the objective function is L 2 , and the current weight W 1 of the student network is adjusted according to the L 2 , and the adjusted weight is W 2 ;
  • the third iteration training input y 3 into the teacher network and the student network, and obtain the corresponding first output data as And the second output data is Calculation Similarity and calculation between data in Similarity between the data in the data; Similarity calculation between data in The probability of all the order of the data in each order, the order of all the sorting is used as the target sorting order; The similarity between the data is calculated The probability of the target arrangement order of each data in the data; the probability of the target arrangement order of each data in the first output data corresponding to y 3 and the probability of the target arrangement order of each data in the second output data are input into the objective function, and the calculation is performed The value of the objective function is L 3 , and the current weight W 2 of the student network is adjusted according to the L 3 , and the adjusted weight is W 3 .
  • the second embodiment of the present invention provides a neural network training device.
  • the structure of the device is as shown in FIG. 3, and includes:
  • the selecting unit 31 is configured to select a teacher network that implements the same function as the student network;
  • the training unit 32 is configured to iteratively train the student network to obtain a target network based on matching the similarity between the data of the first output data corresponding to the same training sample data and the data of the second output data to obtain the target network.
  • the similarity between the output data of the network is migrated to the student network;
  • the first output data is data output from the first specific network layer of the teacher network after the training sample data is input into the teacher network
  • the second output data is input from the student network after the training sample data is input into the student network.
  • the second specific network layer outputs the data.
  • functions implemented by the teacher network and the student network are image classification, target detection, image segmentation, and the like.
  • the teacher network has excellent performance and high accuracy, but its structure is complex relative to the student network, the parameter weight is more, and the calculation speed is slower.
  • the student network is fast, the performance is generally poor or the network structure is simple.
  • the selecting unit 31 may select a network having the same function and excellent performance as that implemented by the student network in the set of the preset neural network models as the teacher network.
  • the first specific network layer is an intermediate network layer or a last layer network layer in the teacher network; and/or the second specific network layer is an intermediate network layer or last of the student network.
  • a layer of network is an intermediate network layer or a last layer network layer in the teacher network; and/or the second specific network layer is an intermediate network layer or last of the student network.
  • the structure of the training unit 32 is as shown in FIG. 4, and specifically includes a construction module 321, a training module 322, and a determination module 323, where:
  • a building module 321 is configured to construct an objective function of the student network, where the objective function includes a matching function of the similarity between the data of the first output data corresponding to the training sample data and the data of the second output data;
  • the training module 322 is configured to perform iterative training on the student network by using the training sample data
  • the determining module 323 is configured to obtain the target network when the training module 322 iteratively trains the number of times to reach a threshold or the target function satisfies a preset convergence condition.
  • the training module 322 is specifically configured to:
  • the training sample data used for the iterative training is referred to as current training sample data, and the iterative training includes the following steps A and B, Step C, Step D, Step E and Step F):
  • Step A input current training sample data for the current iteration training into the teacher network and the student network, respectively, to obtain corresponding first output data and second output data;
  • Step B calculating a similarity between each data in the first output data and calculating a similarity between the data in the second output data;
  • Step C Calculating all the permutations of the data in the first output data according to the similarity between the data in the first output data a probability of order, and selecting a target arrangement order from all the arrangement orders of each data in the first output data;
  • Step D calculating, according to the similarity between each data in the second output data, a probability of a target arrangement order of each data in the second output data;
  • Step E calculating a value of the target function according to a probability of a target arrangement order of each data in the first output data and a probability of a target arrangement order of each data in the second output data, and adjusting according to the value of the target function The weight of the student network;
  • Step F Perform the next iterative training based on the student network after adjusting the weight.
  • the training module 322 selects a target arrangement order from all the arrangement orders of the data in the first output data, and specifically includes: selecting, from all the arrangement orders of the data in the first output data, an arrangement whose probability value is greater than a preset threshold.
  • the order is arranged as a target; or, from all the order of the data in the first output data, the order in which the probability values are arranged in the previous preset number is selected as the target sorting order.
  • the training module 322 calculates the similarity between the data in the first output data, and specifically includes: calculating a spatial distance between the two data in the first output data, and obtaining the two-two data according to the spatial distance. Similarity between
  • the training module 322 calculates the similarity between the data in the second output data, and specifically includes: calculating a spatial distance between the two data in the second output data, and obtaining a similarity between the two data according to the spatial distance degree.
  • the spatial distance may be a European distance, a cosine distance, a block distance, or a Mahalanobis distance, and the present application is not strictly limited. Take the example of calculating the Euclidean distance and cosine distance between two pairs of data.
  • the training module 322 calculates a probability of all the order of the data in the first output data according to the similarity between the data in the first output data, and specifically includes: for each sorting order, the order of the sorting The sequence information and the similarity between all adjacent two data in the arrangement order of the first output data are input into a preset probability calculation model, and the probability of the arrangement order is obtained;
  • the training module 322 calculates a target arrangement order probability of each data in the second output data according to the similarity between the data in the second output data, and specifically includes: sequence information of the target arrangement order for each target arrangement order And the similarity between all adjacent two data in the target arrangement order of the second output data is input into the probability calculation model, and the probability of the target arrangement order is obtained.
  • the probability calculation model may be a first-order Plackett probability model, a high-order Plackett probability model, or other models capable of calculating a probability, which is not strictly limited.
  • the target arrangement order may be one or multiple.
  • the embodiment of the present invention may train the student network based on a manner of matching a probability distribution of a plurality of target arrangement orders.
  • there are various methods for matching the probability distribution of a plurality of target arrangement orders such as a total variation distance based on a probability distribution, a Wesserstein distance, a Jensen-Shannon divergence, or a Kullback-Leibler divergence.
  • the objective function of the student network may only include one matching function, and the objective function may also be The sum of a matching function and a task loss function.
  • the expression of the task loss function is related to the task to be implemented by the student network.
  • the task loss function can be the same as the objective function of the teacher network.
  • the training module 322 adjusts the weight of the student network according to the value of the objective function, and specifically includes: adopting a preset gradient descent optimization algorithm, and adjusting the student network according to the value of the target function. Weights.
  • the training module 322 is further configured to: before calculating the similarity between the data in the first output data and calculating the similarity between the data in the second output data, by using a downsampling algorithm and an interpolation algorithm Processing the first output data and the second output data such that a spatial dimension of the first output data coincides with a spatial dimension of the second output data, and the number of the first output data and the quantity of the second output data are both The number of current training sample data is consistent.
  • steps A to F have no strict sequence, and the following steps A' to B' may be used instead of the above steps A to B.
  • Step A' inputting current training sample data for the iterative training into the teacher network, obtaining corresponding first output data, and calculating a similarity between the data in the first output data;
  • Step B' inputting the current training sample data into the student network, obtaining corresponding second output data, and calculating a similarity between the data in the second output data.
  • the third embodiment of the present invention provides a neural network training device.
  • the structure of the device is as shown in FIG. 5, including: a processor 501 and at least one memory 502.
  • the at least one memory 502 is configured to store at least one machine executable instruction, and the processor 501 executes the at least one instruction to: select a teacher network that implements the same function as the student network; and corresponding to matching the same training sample data Iteratively training the student network to obtain a target network by the similarity between the data of the first output data and the data of the second output data, so as to implement the migration of the similarity between the output data of the teacher network to the student network
  • the first output data is data output from the first specific network layer of the teacher network after the training sample data is input into the teacher network
  • the second output data is input from the student after the training sample data is input into the student network The data output by the second specific network layer of the network.
  • the processor 501 executes the at least one instruction to iteratively train the student network to obtain a target based on matching the inter-sample similarity of the first output data corresponding to the same training sample data with the sample-to-sample similarity of the second output data.
  • the network specifically includes: an objective function for constructing the student network, the objective function includes a matching function of the similarity between the data of the first output data corresponding to the training sample data and the data of the second output data;
  • the training sample data performs iterative training on the student network; when the iterative training number reaches a threshold or the objective function satisfies a preset convergence condition, the target network is obtained.
  • the processor 501 executes the at least one instruction to implement the training sample data on the student network.
  • the iterative training of the network specifically includes: performing the following iterative training on the student network: inputting the current training sample data used for the iterative training into the teacher network and the student network, respectively, to obtain corresponding first output data and a second output data; calculating a similarity between the data in the first output data and calculating a similarity between the data in the second output data; calculating each of the first output data according to the similarity between the data in the first output data a probability of all the order of the data, and selecting a target arrangement order from all the arrangement orders of the data in the first output data; calculating each data in the second output data according to the similarity between the data in the second output data The probability of the target arrangement order; calculating the value of the objective function according to the probability of the target arrangement order of each data in the first output data and the probability of the target arrangement order of each data in the second output data, and according to the objective function The value adjusts the weight of
  • the processor 501 executes the at least one instruction to select a target arrangement order from all the arrangement orders of the data in the first output data, and specifically includes: selecting a probability from all the arrangement orders of the data in the first output data.
  • the arrangement order of the values greater than the preset threshold is used as the target arrangement order; or, the order of the preset number of the probabilities taken in the first output data is selected as the target arrangement order.
  • the processor 501 executes the at least one instruction to implement the calculation of the similarity between the data in the first output data, and specifically includes: calculating a spatial distance between the two data in the first output data, according to the spatial distance Obtaining a similarity between the two data sets; calculating a similarity between the data in the second output data, specifically: calculating a spatial distance between the two data in the second output data, according to the spatial distance, obtaining the The similarity between two data sets.
  • the processor 501 executes the at least one instruction to calculate a probability of calculating all the order of the data in the first output data according to the similarity between the data in the first output data, and specifically includes: for each sorting order, Inputting the order information of the arrangement order and the similarity between all adjacent two data in the arrangement order of the first output data into a preset probability calculation model to obtain a probability of the arrangement order; according to the second output data Calculating the probability of the target arrangement order of each data in the second output data, the specificity includes: ordering the order of the target arrangement order and the target arrangement of the second output data for each target arrangement order The similarity between all adjacent two data in the sequence is input into the probability calculation model, and the probability of the target arrangement order is obtained.
  • the objective function of the student network is as follows:
  • ⁇ t is the target arrangement order of each data in the first output data corresponding to the current training sample data
  • X s is the second output data corresponding to the current training sample data
  • X s ) is the second output The probability of the target order of each data in the data.
  • the objective function of the student network is as follows:
  • is a target arrangement order
  • X s is the second output data corresponding to the current training sample data
  • X t is the first output data corresponding to the current training sample data
  • X s ) is the current training sample data.
  • the order of each data in the second transmission data is a probability of ⁇
  • X t ) is a probability that the order of each data in the first transmission data of the current training sample data is ⁇
  • Q is the target arrangement order. set.
  • the processor 501 performs the at least one instruction implementation, and adjusts the weight of the student network according to the value of the target function, which includes: adopting a preset gradient descent optimization algorithm, according to the target function The value adjusts the weight of the student network.
  • the processor further performs the performing, after the processor 501 executes the at least one instruction to calculate a similarity between data in the first output data and calculates a similarity between data in the second output data.
  • At least one instruction is implemented to: process the first output data and the second output data by a downsampling algorithm and an interpolation algorithm, such that a spatial dimension of the first output data is consistent with a spatial dimension of the second output data, and The number of one output data and the number of second output data are both consistent with the number of current training sample data.
  • the first specific network layer is an intermediate network layer or a last layer network layer in the teacher network; and the second specific network layer is an intermediate network layer or a last layer network layer of the student network.
  • the embodiment of the present invention further provides a storage medium (which may be a non-volatile machine readable storage medium), where the computer program for storing neural network training is stored.
  • the program has a code segment configured to perform the following steps: selecting a teacher network that implements the same function as the student network; based on matching the data between the first output data corresponding to the same training sample data and the data of the second output data Similarity to iteratively train the student network to obtain a target network to implement migration of similarity between output data of the teacher network to the student network; wherein: the first output data is used to input the training sample data into a teacher network The data output from the first specific network layer of the teacher network, the second output data is data output from the second specific network layer of the student network after the training sample data is input into the student network.
  • an embodiment of the present invention further provides a computer program having a code segment configured to perform the following neural network training: selecting a teacher network that implements the same function as the student network; Iteratively training the student network to obtain a target network by using the similarity between the data of the first output data corresponding to the same training sample data and the data of the second output data to achieve similarity migration between the output data of the teacher network Go to the student network; wherein: the first output data is data output from a first specific network layer of a teacher network after the training sample data is input into a teacher network, and the second output data is input to the training sample data The student network then outputs data from the second specific network layer of the student network.
  • the sample training data can be completely migrated to the student network in the data of the output data output by the teacher network, thereby realizing the result and passing of the training sample data output through the teacher network.
  • the results of the target network output are basically the same.
  • the output of the target network and the output of the teacher network are basically the same in the test set, thereby improving the accuracy of the student network.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé et un dispositif d'apprentissage de réseau neuronal, le procédé consistant : à sélectionner un réseau enseignant réalisant les mêmes fonctions qu'un réseau étudiant (101) ; et à entraîner de manière itérative le réseau étudiant sur la base de la mise en correspondance d'une similarité de données de premières données de sortie et d'une similarité de données de secondes données de sortie correspondant aux mêmes données d'échantillon d'apprentissage afin d'obtenir un réseau cible, de façon à mettre en œuvre une migration de similarité de données de sortie, du réseau enseignant au réseau étudiant (102), les premières données de sortie étant des données délivrées en sortie à partir d'une première couche de réseau spécifique du réseau enseignant une fois que les données d'échantillon d'apprentissage sont entrées dans le réseau enseignant, et les secondes données délivrées en sortie étant des données délivrées en sortie à partir d'une seconde couche de réseau spécifique du réseau étudiant une fois que les données d'échantillon d'apprentissage sont entrées dans le réseau étudiant. Le réseau étudiant entraîné par le procédé selon la similarité de données de sortie du réseau enseignant présente une meilleure performance.
PCT/CN2017/102032 2017-06-15 2017-09-18 Procédé et dispositif d'apprentissage de réseau neuronal WO2018227800A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710450211.9A CN107358293B (zh) 2017-06-15 2017-06-15 一种神经网络训练方法及装置
CN201710450211.9 2017-06-15

Publications (1)

Publication Number Publication Date
WO2018227800A1 true WO2018227800A1 (fr) 2018-12-20

Family

ID=60273856

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/102032 WO2018227800A1 (fr) 2017-06-15 2017-09-18 Procédé et dispositif d'apprentissage de réseau neuronal

Country Status (2)

Country Link
CN (2) CN110969250B (fr)
WO (1) WO2018227800A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291836A (zh) * 2020-03-31 2020-06-16 中国科学院计算技术研究所 一种生成学生网络模型的方法
CN111340221A (zh) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 神经网络结构的采样方法和装置
CN111435424A (zh) * 2019-01-14 2020-07-21 北京京东尚科信息技术有限公司 一种图像处理方法和设备
CN111444958A (zh) * 2020-03-25 2020-07-24 北京百度网讯科技有限公司 一种模型迁移训练方法、装置、设备及存储介质
CN111598213A (zh) * 2020-04-01 2020-08-28 北京迈格威科技有限公司 网络训练方法、数据识别方法、装置、设备和介质
US12033068B2 (en) 2018-06-22 2024-07-09 Advanced New Technologies Co., Ltd. Method and device for cash advance recognition

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304915B (zh) * 2018-01-05 2020-08-11 大国创新智能科技(东莞)有限公司 一种深度学习神经网络的分解与合成方法和系统
CN108830288A (zh) * 2018-04-25 2018-11-16 北京市商汤科技开发有限公司 图像处理方法、神经网络的训练方法、装置、设备及介质
CN108921282B (zh) * 2018-05-16 2022-05-31 深圳大学 一种深度神经网络模型的构建方法和装置
CN108764462A (zh) * 2018-05-29 2018-11-06 成都视观天下科技有限公司 一种基于知识蒸馏的卷积神经网络优化方法
CN110598504B (zh) * 2018-06-12 2023-07-21 北京市商汤科技开发有限公司 图像识别方法及装置、电子设备和存储介质
CN108830813B (zh) * 2018-06-12 2021-11-09 福建帝视信息科技有限公司 一种基于知识蒸馏的图像超分辨率增强方法
CN108898168B (zh) * 2018-06-19 2021-06-01 清华大学 用于目标检测的卷积神经网络模型的压缩方法和系统
CN108985920A (zh) * 2018-06-22 2018-12-11 阿里巴巴集团控股有限公司 套现识别方法和装置
CN109783824B (zh) * 2018-12-17 2023-04-18 北京百度网讯科技有限公司 基于翻译模型的翻译方法、装置及存储介质
CN109637546B (zh) * 2018-12-29 2021-02-12 苏州思必驰信息科技有限公司 知识蒸馏方法和装置
CN109840588B (zh) * 2019-01-04 2023-09-08 平安科技(深圳)有限公司 神经网络模型训练方法、装置、计算机设备及存储介质
CN109800821A (zh) * 2019-01-31 2019-05-24 北京市商汤科技开发有限公司 训练神经网络的方法、图像处理方法、装置、设备和介质
CN110009052B (zh) * 2019-04-11 2022-11-18 腾讯科技(深圳)有限公司 一种图像识别的方法、图像识别模型训练的方法及装置
CN110163344B (zh) * 2019-04-26 2021-07-09 北京迈格威科技有限公司 神经网络训练方法、装置、设备和存储介质
CN111401406B (zh) * 2020-02-21 2023-07-18 华为技术有限公司 一种神经网络训练方法、视频帧处理方法以及相关设备
CN112116441B (zh) * 2020-10-13 2024-03-12 腾讯科技(深圳)有限公司 金融风险分类模型的训练方法、分类方法、装置及设备
CN112712052A (zh) * 2021-01-13 2021-04-27 安徽水天信息科技有限公司 一种机场全景视频中微弱目标的检测识别方法
CN112365886B (zh) * 2021-01-18 2021-05-07 深圳市友杰智新科技有限公司 语音识别模型的训练方法、装置和计算机设备
CN113378940B (zh) * 2021-06-15 2022-10-18 北京市商汤科技开发有限公司 神经网络训练方法、装置、计算机设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090971A (zh) * 2014-07-17 2014-10-08 中国科学院自动化研究所 面向个性化应用的跨网络行为关联方法
CN104657596A (zh) * 2015-01-27 2015-05-27 中国矿业大学 一种基于模型迁移的大型新压缩机性能预测快速建模方法
CN105844331A (zh) * 2015-01-15 2016-08-10 富士通株式会社 神经网络系统及该神经网络系统的训练方法
US20170024641A1 (en) * 2015-07-22 2017-01-26 Qualcomm Incorporated Transfer learning in neural networks

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7062476B2 (en) * 2002-06-17 2006-06-13 The Boeing Company Student neural network
CN103020711A (zh) * 2012-12-25 2013-04-03 中国科学院深圳先进技术研究院 分类器训练方法及其系统
US20150046181A1 (en) * 2014-02-14 2015-02-12 Brighterion, Inc. Healthcare fraud protection and management
US20160328644A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Adaptive selection of artificial neural networks
CN105787513A (zh) * 2016-03-01 2016-07-20 南京邮电大学 多示例多标记框架下基于域适应迁移学习设计方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090971A (zh) * 2014-07-17 2014-10-08 中国科学院自动化研究所 面向个性化应用的跨网络行为关联方法
CN105844331A (zh) * 2015-01-15 2016-08-10 富士通株式会社 神经网络系统及该神经网络系统的训练方法
CN104657596A (zh) * 2015-01-27 2015-05-27 中国矿业大学 一种基于模型迁移的大型新压缩机性能预测快速建模方法
US20170024641A1 (en) * 2015-07-22 2017-01-26 Qualcomm Incorporated Transfer learning in neural networks

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12033068B2 (en) 2018-06-22 2024-07-09 Advanced New Technologies Co., Ltd. Method and device for cash advance recognition
CN111435424A (zh) * 2019-01-14 2020-07-21 北京京东尚科信息技术有限公司 一种图像处理方法和设备
CN111340221A (zh) * 2020-02-25 2020-06-26 北京百度网讯科技有限公司 神经网络结构的采样方法和装置
CN111340221B (zh) * 2020-02-25 2023-09-12 北京百度网讯科技有限公司 神经网络结构的采样方法和装置
CN111444958A (zh) * 2020-03-25 2020-07-24 北京百度网讯科技有限公司 一种模型迁移训练方法、装置、设备及存储介质
CN111444958B (zh) * 2020-03-25 2024-02-13 北京百度网讯科技有限公司 一种模型迁移训练方法、装置、设备及存储介质
CN111291836A (zh) * 2020-03-31 2020-06-16 中国科学院计算技术研究所 一种生成学生网络模型的方法
CN111291836B (zh) * 2020-03-31 2023-09-08 中国科学院计算技术研究所 一种生成学生网络模型的方法
CN111598213A (zh) * 2020-04-01 2020-08-28 北京迈格威科技有限公司 网络训练方法、数据识别方法、装置、设备和介质
CN111598213B (zh) * 2020-04-01 2024-01-23 北京迈格威科技有限公司 网络训练方法、数据识别方法、装置、设备和介质

Also Published As

Publication number Publication date
CN110969250A (zh) 2020-04-07
CN107358293B (zh) 2021-04-02
CN110969250B (zh) 2023-11-10
CN107358293A (zh) 2017-11-17

Similar Documents

Publication Publication Date Title
WO2018227800A1 (fr) Procédé et dispositif d'apprentissage de réseau neuronal
US11651259B2 (en) Neural architecture search for convolutional neural networks
US11295208B2 (en) Robust gradient weight compression schemes for deep learning applications
CN108805258B (zh) 一种神经网络训练方法及其装置、计算机服务器
WO2018090706A1 (fr) Procédé et dispositif d'élagage de réseau neuronal
JP7110240B2 (ja) ニューラルネットワーク分類
WO2016062044A1 (fr) Procédé, dispositif et système d'apprentissage de paramètres de modèle
CN109697510B (zh) 具有神经网络的方法和装置
CN110503192A (zh) 资源有效的神经架构
US10534999B2 (en) Apparatus for classifying data using boost pooling neural network, and neural network training method therefor
WO2018227801A1 (fr) Procédé et dispositif de construction de réseau neuronal
CN113168559A (zh) 机器学习模型的自动化生成
US11093714B1 (en) Dynamic transfer learning for neural network modeling
EP3789928A2 (fr) Procédé et appareil de réseau neuronal
CN111008631A (zh) 图像的关联方法及装置、存储介质和电子装置
CN114072809A (zh) 经由神经架构搜索的小且快速的视频处理网络
US11822544B1 (en) Retrieval of frequency asked questions using attentive matching
EP4009239A1 (fr) Procédé et appareil de recherche d'architecture neurale basée de la performance matérielle
CN112052865A (zh) 用于生成神经网络模型的方法和装置
Wang et al. Towards efficient convolutional neural networks through low-error filter saliency estimation
CN116384471A (zh) 模型剪枝方法、装置、计算机设备、存储介质和程序产品
CN110457155A (zh) 一种样本类别标签的修正方法、装置及电子设备
US20220138554A1 (en) Systems and methods utilizing machine learning techniques for training neural networks to generate distributions
Culp et al. On adaptive regularization methods in boosting
CN114861671A (zh) 模型训练方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17913592

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17913592

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.04.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17913592

Country of ref document: EP

Kind code of ref document: A1