CN109800821A - Method, image processing method, device, equipment and the medium of training neural network - Google Patents

Method, image processing method, device, equipment and the medium of training neural network Download PDF

Info

Publication number
CN109800821A
CN109800821A CN201910095785.8A CN201910095785A CN109800821A CN 109800821 A CN109800821 A CN 109800821A CN 201910095785 A CN201910095785 A CN 201910095785A CN 109800821 A CN109800821 A CN 109800821A
Authority
CN
China
Prior art keywords
neural network
correlation
samples
sample
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910095785.8A
Other languages
Chinese (zh)
Inventor
彭宝云
金啸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN201910095785.8A priority Critical patent/CN109800821A/en
Publication of CN109800821A publication Critical patent/CN109800821A/en
Pending legal-status Critical Current

Links

Landscapes

  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The embodiment of the present disclosure discloses method, image processing method, device, equipment and the medium of a kind of trained neural network, wherein, the method for training neural network includes: to determine that sample image concentrates first degree of correlation between at least two samples by first nerves network;Based on the first nerves network to first degree of correlation between the processing result and at least two sample of at least two sample, training nervus opticus network, obtain target nerve network, wherein the network size of the first nerves network is greater than the network size of the nervus opticus network.The embodiment of the present disclosure can be with the performance for the neural network that training for promotion obtains.

Description

Method for training neural network, image processing method, apparatus, device, and medium
Technical Field
The present disclosure relates to computer vision technologies, and in particular, to a method of training a neural network, an image processing method, an apparatus, a device, and a medium.
Background
Deep neural networks achieve very good performance in many computer vision tasks. Generally, the larger the parameters and computations of the network, the better the performance of the network. However, it is very difficult to deploy such a large-scale network on a resource-constrained embedded system, and the performance of the network obtained by directly training the small-scale network is far lower than that of the large-scale network. How to improve the performance of a smaller-scale network without increasing the training data is a very significant research topic.
Knowledge Distillation (Knowledge Distillation) can better train smaller scale networks by minimizing the cross entropy between the outputs of larger and smaller scale networks, reducing the performance gap between smaller and larger scale networks. However, the performance of the small-scale network obtained by training the existing knowledge distillation method still needs to be further improved.
Disclosure of Invention
The embodiment of the disclosure provides a technical scheme for training a neural network and processing an image.
According to an aspect of an embodiment of the present disclosure, there is provided a method of training a neural network, including:
determining a first degree of correlation between at least two samples in the sample image set by a first neural network;
and training a second neural network based on the processing result of the first neural network on the at least two samples and a first correlation degree between the at least two samples to obtain a target neural network, wherein the network scale of the first neural network is larger than that of the second neural network.
Optionally, in the above method embodiment of the present disclosure, the determining, by the first neural network, a first correlation between at least two samples in the sample image set includes:
performing feature extraction processing on each sample of the at least two samples through the first neural network to obtain first feature data of each sample;
and obtaining a first correlation degree between the two samples included in each sample pair according to the first characteristic data of the two samples included in each sample pair of at least one sample pair formed by the at least two samples.
Optionally, in any of the above method embodiments of the present disclosure, the first characteristic data includes data output by at least one of a last layer and an intermediate layer of the first neural network.
Optionally, in any one of the method embodiments of the present disclosure, the obtaining, according to the first feature data of the two samples included in each sample pair of at least one sample pair composed of the at least two samples, a first correlation between the two samples included in each sample pair includes:
and carrying out nonlinear mapping processing on first characteristic data of a first sample and a second sample included in the sample pair to obtain a first correlation between the first sample and the second sample.
Optionally, in any one of the method embodiments of the present disclosure, the training a second neural network based on the processing result of the first neural network on the at least two samples and a first correlation between the at least two samples to obtain a target neural network includes:
determining, by the second neural network, a second degree of correlation between the at least two samples;
obtaining a correlation degree loss value of the second neural network based on the first correlation degree and the second correlation degree;
and adjusting the network parameters of the second neural network according to the correlation degree loss value of the second neural network to obtain the target neural network.
Optionally, in any one of the method embodiments of the present disclosure, the training a second neural network based on the processing result of the first neural network on the at least two samples and a first correlation between the at least two samples to obtain a target neural network further includes:
obtaining a migration loss value of the second neural network based on the processing result of the first neural network on the at least two samples and the processing result of the second neural network on the at least two samples;
the adjusting the network parameters of the second neural network according to the correlation degree loss value of the second neural network to obtain the target neural network comprises:
and adjusting the network parameters of the second neural network based on the correlation degree loss value and the migration loss value to obtain the target neural network.
Optionally, in any one of the method embodiments of the present disclosure, the adjusting, according to the correlation loss value of the second neural network, a network parameter of the second neural network includes:
and adjusting network parameters of the second neural network according to the correlation degree loss value of the second neural network, so that the difference between the correlation degrees of the second neural network and the first neural network on the same pair of samples is reduced.
Optionally, in any of the above method embodiments of the present disclosure, the at least two samples belong to the same category.
According to another aspect of the embodiments of the present disclosure, there is provided an image processing method including:
acquiring an image to be processed;
and inputting the image to be processed into a target neural network for processing to obtain an image processing result, wherein the target neural network is obtained by training through the method of any one of the embodiments.
According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for training a neural network, including:
the processing unit is used for determining a first correlation degree between at least two samples in the sample image set through a first neural network;
and the training unit is used for training a second neural network based on the processing result of the first neural network on the at least two samples and the first correlation between the at least two samples to obtain a target neural network, wherein the network scale of the first neural network is larger than that of the second neural network.
Optionally, in an embodiment of the apparatus of the present disclosure, the processing unit includes:
the characteristic extraction subunit is used for performing characteristic extraction processing on each sample of the at least two samples through the first neural network to obtain first characteristic data of each sample;
and the first correlation determining subunit is configured to obtain a first correlation between the two samples included in each sample pair according to the first feature data of the two samples included in each sample pair in at least one sample pair formed by the at least two samples.
Optionally, in any one of the apparatus embodiments of the present disclosure above, the first characteristic data includes data output by at least one of a last layer and an intermediate layer of the first neural network.
Optionally, in an embodiment of any one of the above apparatuses of the present disclosure, the first correlation determining subunit is configured to perform a non-linear mapping process on first feature data of a first sample and a second sample included in the sample pair, so as to obtain a first correlation between the first sample and the second sample.
Optionally, in any one of the apparatus embodiments of the present disclosure above, the training unit includes:
a second degree of correlation determination subunit for determining a second degree of correlation between the at least two samples by the second neural network;
a first loss value determining subunit, configured to obtain a correlation loss value of the second neural network based on the first correlation and the second correlation;
and the parameter value adjusting subunit is used for adjusting the network parameters of the second neural network according to the correlation degree loss value of the second neural network to obtain the target neural network.
Optionally, in any one of the apparatus embodiments of the present disclosure, the training unit further includes:
a second loss value determination subunit, configured to obtain a migration loss value of the second neural network based on a processing result of the first neural network on the at least two samples and a processing result of the second neural network on the at least two samples;
and the parameter value adjusting subunit is configured to adjust a network parameter of the second neural network based on the correlation loss value and the migration loss value, so as to obtain the target neural network.
Optionally, in an embodiment of the apparatus of the present disclosure, the parameter value adjusting subunit is configured to perform adjustment processing on the network parameters of the second neural network according to the correlation loss value of the second neural network, so that a difference between correlations obtained for the same pair of samples by the second neural network and the first neural network is reduced.
Optionally, in any of the above apparatus embodiments of the present disclosure, the at least two samples belong to the same category.
According to still another aspect of an embodiment of the present disclosure, there is provided an image processing apparatus including:
the acquisition unit is used for acquiring an image to be processed;
and the processing unit is used for inputting the image to be processed into a target neural network for processing to obtain an image processing result, wherein the target neural network is obtained by training through the device in any embodiment.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including the apparatus according to any of the embodiments.
According to still another aspect of an embodiment of the present disclosure, there is provided an electronic device including:
a memory for storing executable instructions; and
a processor configured to execute the executable instructions to perform the method according to any of the above embodiments.
According to yet another aspect of the embodiments of the present disclosure, there is provided a computer program comprising computer readable code which, when run on a device, executes instructions for implementing the method of any of the above embodiments.
According to yet another aspect of the embodiments of the present disclosure, there is provided a computer storage medium for storing computer-readable instructions, which when executed implement the method of any of the above embodiments.
Based on the method for training the neural network, the image processing method, the device, the equipment and the medium, provided by the embodiment of the disclosure, the correlation degree of the sample in the feature space is modeled to effectively capture the correlation degree of the sample in the feature space of the first neural network, and in the process of training the second neural network based on the processing result of the first neural network on the sample, the correlation degree of the sample in the feature space of the first neural network is utilized to restrict the feature space of the second neural network, so that the features learned by the target neural network obtained by training have differentiability consistent with that of the first neural network, and when the method is applied to methods such as knowledge distillation or simulated learning, the neural network with a smaller scale can be trained better, and the performance of the neural network with the smaller scale is improved.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram of a method of training a neural network according to some embodiments of the present disclosure;
FIG. 2 is a flow diagram of determining a first degree of correlation between two samples in a sample pair by a first neural network in accordance with some embodiments of the present disclosure;
FIG. 3 is a flow diagram of training a second neural network based on a first degree of correlation between two samples included in a sample pair according to some embodiments of the present disclosure;
FIG. 4 is a flow diagram of training a second neural network based on a first degree of correlation between at least two samples according to some embodiments of the present disclosure;
FIG. 5 is a schematic diagram of an example of a method of training a neural network provided by an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of one embodiment of applying the method of training a neural network of the present disclosure to knowledge distillation;
fig. 7A to 7C are schematic comparison diagrams of the visualization result of the sample image set in the feature space of the neural network;
FIG. 8 is a schematic structural diagram of an apparatus for training a neural network according to some embodiments of the present disclosure;
FIG. 9 is a schematic diagram of a processing unit according to some embodiments of the present disclosure;
FIG. 10 is a schematic diagram of a training unit according to some embodiments of the present disclosure;
FIG. 11 is a schematic diagram of a training unit according to further embodiments of the present disclosure;
fig. 12 is a schematic structural diagram of an electronic device according to some embodiments of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
The disclosed embodiments may be applied to computer systems/servers that are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the computer system/server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
The computer system/server may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, third programs, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Fig. 1 is a flow diagram of a method of training a neural network according to some embodiments of the present disclosure. The method may be performed by a terminal device or a server, for example: mobile terminal devices such as cameras, video cameras, mobile phones, and in-vehicle computers.
A first correlation between at least two samples in the sample image set is determined 102 by a first neural network.
In embodiments of the present disclosure, the first neural network may be a trained neural network, and the first neural network may be a larger scale neural network, such as: the number of network parameters of the first neural network is greater than a certain value, but the embodiment of the disclosure does not limit this. Alternatively, the first Neural Network may be a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), or the like, and the type of the first Neural Network is not limited in the embodiments of the present disclosure. The first neural network may be a neural network adapted for different computer vision tasks, for example: a target recognition task, a target classification task, a target detection task or a posture estimation task, etc. The first neural network may also be a neural network suitable for different application scenarios, such as: the embodiment of the disclosure does not limit the application range of the first neural network. Alternatively, the network structure of the first neural network may be designed according to computer vision tasks, or the network structure of the first neural network may employ at least a part of existing network structures, such as: a Deep Residual Network (ResNet), a visual geometry group Network (VGGNet), a google Network (google Network), and the like, and the Network structure of the first neural Network is not limited in the embodiment of the present disclosure.
Alternatively, the sample image set may be an existing sample image set, such as: a Megaface image set, an ImageNet image set or a CIFAR-100 image set and the like; alternatively, the sample image set may contain images collected from a network, an image acquisition device, and/or an image processing device, etc., such as: the source of the sample is not limited in the embodiment of the present disclosure, such as an original image such as a drawn picture, a photograph, a video frame, and/or an image obtained by performing data enhancement processing on the original image. In some embodiments, the samples in the sample image set are provided with annotation information that matches the corresponding computer vision task, such as: the annotation information may include category information, etc., but the embodiment of the present disclosure does not limit the specific implementation of the annotation information.
In 102, optionally, a correlation between two samples included in each of at least one sample pair in the sample image set may be determined, for example, a correlation between each two samples in the sample image set, or a correlation between some samples in the sample image set. As another example, the correlation between samples belonging to the same category in the sample image set is determined, or the correlation between samples belonging to different categories in the sample image set is determined, and so on. At least two samples may be randomly obtained from the sample image set, or at least two samples may be sequentially obtained from the sample image set according to an arrangement order in the sample image set.
In some embodiments, at least one sample comprised by the sample image set may be pre-processed before determining the correlation between two samples comprised in the sample pair. Optionally, the pre-processing comprises at least one of the following: scale adjustment, brightness adjustment, correction processing, clipping processing and the like. For example, the first neural network has a requirement on the size of the input image, at this time, if the size of the sample in the sample image set meets the size requirement of the first neural network on the input image, the sample obtained from the sample image set may be directly input to the first neural network, and if the size of the sample in the sample image set does not meet the size requirement of the first neural network on the input image, the sample may be input to the first neural network after being subjected to size adjustment processing.
In embodiments of the present disclosure, the degree of correlation between samples may reflect the degree of semantic correlation between samples. In some embodiments, the correlation between the samples may be determined based on the corresponding positions of the samples in the feature space of the neural network, for example, first feature data of each sample in the sample pair in the feature space of the first neural network may be obtained, and a first correlation between two samples included in the sample pair may be determined according to the first feature data of two samples in the sample pair. In the embodiment of the present disclosure, the feature data may include at least one feature vector, at least one feature map, or other forms. Alternatively, a similarity calculation may be performed according to the first feature data of the two samples included in the sample pair to obtain a first correlation between the two samples, for example: and performing dot product operation on the first characteristic data of the two samples. Or, performing nonlinear mapping processing on the first feature data of two samples included in the sample pair to obtain a first correlation between the two samples, so as to introduce high-order information into the correlation by using the nonlinear mapping processing, thereby improving the accuracy of the correlation and the performance of the second neural network, for example: the first feature data of the two samples are processed through a kernel function (kernel) to obtain a correlation between the two samples, where the kernel function may adopt a dot product kernel function, a polynomial kernel function, or a radial basis kernel function, and the like.
And 104, training a second neural network based on the processing result of the first neural network on the at least two samples and the first correlation between the at least two samples to obtain a target neural network.
In embodiments of the present disclosure, the second neural network may be a neural network to be trained, and the second neural network may be a smaller scale neural network, such as: the number of network parameters of the second neural network is less than a certain value, but the embodiment of the disclosure does not limit this. The network scale of the first neural network is larger than that of the second neural network, the first neural network can be a teacher network (teacher network), the second neural network can be a student network (student network), the teacher network is used for training the student network, and the performance of the trained student network can be improved. In an alternative example, the training of the second neural network may be performed using a knowledge distillation method or other methods, which are not limited by the embodiments of the present disclosure.
Alternatively, the second neural network may be the same type of neural network as the first neural network, or the second neural network may be a different type of neural network from the first neural network, and the type of the second neural network is not limited in the embodiments of the present disclosure, for example: the second neural network may be a convolutional neural network, a deep neural network, a cyclic neural network, or the like. The second neural network may be a neural network that is suitable for the same computer vision task as the first neural network, for example: a target recognition task, a target classification task, a target detection task or a posture estimation task, etc. The second neural network may also be a neural network that is suitable for the same application scenario as the first neural network, for example: the embodiment of the disclosure does not limit the application range of the second neural network. Alternatively, the second neural network may adopt a network structure identical or similar to the first neural network, or the second neural network may adopt a network structure completely different from the first neural network, and the network structure of the second neural network is not limited in the embodiments of the present disclosure. In an alternative example, the network structure of the second neural network may be designed according to computer vision tasks. In another alternative example, the second neural network may employ at least a portion of an existing network structure, such as: lightweight network structures such as SqueezeNet, MobileNet or ShuffleNet.
In some embodiments, the second neural network is trained based on a first degree of correlation between at least two samples. Optionally, the supervised information of the second neural network may be obtained based on a processing result of the first neural network on the at least two samples and a first correlation between the at least two samples. For example, the network parameters of the second neural network are adjusted according to the supervision information to obtain the target neural network. For another example, based on the monitoring information, a network loss value is obtained, and the network parameters of the second neural network are adjusted according to the network loss value to obtain the target neural network, and so on.
Based on the method for training the neural network provided by the embodiment of the disclosure, a first correlation between at least two samples in a sample image set is determined through a first neural network, a second neural network with a smaller network scale than the first neural network is trained based on a processing result of the first neural network on the at least two samples and the first correlation between the at least two samples, a target neural network is obtained, the correlation of the samples in a feature space of the first neural network is effectively captured by utilizing the modeling of the correlation of the samples in the feature space, the feature space of the second neural network is constrained by utilizing the correlation of the samples in the feature space of the first neural network in the process of training the second neural network based on the processing result of the first neural network on the samples, and the features learned by the trained target neural network can have differentiability consistent with that of the first neural network, when the method is applied to knowledge distillation or simulation learning, the neural network with smaller scale can be trained better, and the performance of the neural network with smaller scale is improved.
Fig. 5 is a schematic diagram of an example of a method for training a neural network provided by an embodiment of the present disclosure, and as shown in fig. 5, a first neural network employs ResNet-50, and a second neural network employs MobileNetV 2. The first neural network includes: the Convolutional Layer (Convolutional Layer) conv1, Convolutional modules (Convolutional Block) c2, c3, c4 and c5, full Connected Layer (full Connected Layer) fc and softmax function, wherein each Convolutional module comprises more than two network layers, the more than two network layers comprise Convolutional layers, and the composition structure of each Convolutional module is similar. The samples are input from the convolutional layer conv1, sequentially processed by the convolutional modules c2, c3, c4, c5 and the full link layer fc, and output by the softmax function, so as to obtain a prediction result (Predict). The second neural network includes: the system comprises a convolutional layer conv1, Bottleneck modules (Bottleneck Block) b1, b2, b3, b4, b5, b6 and b7, full connection layers fc and a softmax function, wherein each Bottleneck module comprises more than two network layers, and the composition structures of the Bottleneck modules are similar. The samples are input by convolutional layer conv1, processed by bottleneck modules b1, b2, b3, b4, b5, b6 and b7 and a full link layer fc in sequence, and output by a softmax function to obtain a prediction result (Predict).
The process of determining the first correlation between two samples in a sample pair by the first neural network will be described in detail below with reference to the example shown in fig. 2.
202, performing feature extraction processing on each sample included in the sample pair through a first neural network to obtain first feature data of each sample.
The first neural network has a multi-layer network structure, and in some embodiments, in the process of performing the feature extraction process on the sample through the first neural network, data output by one network layer in the first neural network may be selected as the first feature data, that is, the first feature data includes data output by one layer in the first neural network. For example: data output by the last layer or some middle layer of the first neural network. In other embodiments, data output by multiple network layers in the first neural network may be selected as the first feature data, that is, the first feature data may include data output by multiple layers in the first neural network, for example: including data output by the last layer and at least one middle layer of the first neural network, or including data output by at least two middle layers, the embodiment of the disclosure does not limit the specific implementation of the first feature data. In an alternative example, as shown in fig. 5, the data output by the convolution modules c4, c5 and the full connection layer fc in the first neural network is selected as the first feature data.
And 204, obtaining a first correlation degree between two samples included in the sample pair according to the first characteristic data of each sample in the sample pair.
Optionally, in a case where data output by one layer in the first neural network is selected as the first feature data, a first correlation between two samples included in a sample pair may be obtained according to the first feature data of each sample in the sample pair, for example: and processing the first characteristic data of the first sample and the second sample by using the kernel function to obtain a first correlation between the first sample and the second sample. Optionally, in a case where data output from multiple layers in the first neural network is selected as the first feature data, a correlation between two samples included in the sample pair may be obtained according to the first feature data output from two samples in the sample pair respectively at the same layer of the first neural network, and the first correlation between two samples included in the sample pair may be obtained based on the correlations corresponding to the multiple layers, for example: the data output by the last layer and a certain middle layer of the first neural network are selected as first feature data, first middle correlation degrees can be obtained according to the first feature data which are respectively output by a first sample and a second sample which are included in a sample pair at the last layer of the first neural network, second middle correlation degrees can be obtained according to the first feature data which are respectively output by the first sample and the second sample which are included in the sample pair at the same middle layer of the first neural network, a correlation degree vector is obtained based on the first middle correlation degrees and the second middle correlation degrees, and the correlation degree vector is used as the first correlation degrees between the first sample and the second sample.
According to the embodiment of the disclosure, through modeling the correlation of the characteristic data output by the sample in one or more layers of the first neural network, in the process of training the second neural network based on the processing result of the first neural network on the sample, the correlation constraint on the sample can be introduced into one or more layers of the second neural network, so that the multiple correlation constraints can be utilized, and the performance of the target neural network obtained through training is further improved.
In the example shown in fig. 2, the first feature data is obtained by performing feature extraction processing on the sample by the first neural network. In other embodiments, the first characteristic data of the sample may be obtained from other devices or by other means, which is not limited by the embodiments of the disclosure.
The following describes in detail a procedure for training the second neural network based on the first correlation between two samples included in the sample pair, with reference to the example shown in fig. 3.
A second degree of correlation between the at least two samples is determined 302 by the second neural network.
In embodiments of the present disclosure, a second degree of correlation between the two samples included in each of the at least one sample pair may be determined. Optionally, the second degree of correlation may be determined in a similar manner to the first degree of correlation, which is not limited in this disclosure. For example: and performing feature extraction processing on each sample included in the sample pair through a second neural network to obtain second feature data of each sample, and obtaining a second correlation degree between the two samples included in the sample pair according to the second feature data of each sample in the sample pair.
The second neural network has a multi-layer network structure, and in some embodiments, in the process of performing the feature extraction processing on the sample through the second neural network, the network layer of the second neural network outputting the second feature data may be selected according to the network layer of the first neural network outputting the first feature data. Alternatively, in the case of selecting data output by one layer of the first neural network as the first feature data, data output by one network layer of the second neural network corresponding to the network layer of the first neural network outputting the first feature data may be selected as the second feature data, for example: and under the condition that the data output by the last layer of the first neural network is selected as the first characteristic data, the data output by the last layer of the second neural network is selected as the second characteristic data. Alternatively, in a case where data output from a plurality of layers in the first neural network is selected as the first feature data, data output from a plurality of network layers in the second neural network corresponding to the network layer in which the first neural network outputs the first feature data may be selected as the second feature data, for example: and under the condition that the data output by the last layer and a certain middle layer of the first neural network are selected as the first characteristic data, the data output by the last layer of the second neural network and a middle layer corresponding to the middle layer of the first neural network outputting the first characteristic data are selected as the second characteristic data. In an alternative example, as shown in fig. 5, in the case of selecting the data output by the convolution modules c4, c5 and the full connection layer fc in the first neural network as the first feature data, the data output by the bottleneck modules b4, b6 and the full connection layer fc in the second neural network is selected as the second feature data.
In some embodiments, the correlation between the samples is obtained in the same manner by the second neural network and the corresponding network layer in the first neural network. Optionally, in a case where data output by one layer in the second neural network is selected as the second feature data, the second correlation between two samples included in the sample pair may be obtained in the same manner as the first correlation obtained by the corresponding network layer in the first neural network according to the second feature data of each sample in the sample pair, for example: when the first feature data of the two samples in the sample pair is processed by using the kernel function to obtain the first correlation, the second feature data of the two samples in the sample pair may be processed by using the kernel function to obtain the second correlation between the two samples. Optionally, in a case where data output in multiple layers in the second neural network is selected as the first feature data, according to second feature data output by two samples in the sample pair respectively in the same layer of the second neural network, the same manner as that of obtaining the correlation by the corresponding network layer in the first neural network is adopted to obtain the correlation between the two samples included in the sample pair, and based on the correlation corresponding to the multiple layers respectively, the second correlation between the two samples included in the sample pair is obtained, for example: selecting data output by the last layer and a certain intermediate layer of the second neural network as second characteristic data, processing the second characteristic data output by the two samples in the sample pair respectively at the last layer of the second neural network by using the kernel function under the condition that the first characteristic data output by the two samples in the sample pair at the last layer of the first neural network is processed by using the kernel function to obtain a correlation degree, processing the second characteristic data output by the two samples in the sample pair respectively at the intermediate layers corresponding to the two samples in the second neural network by using the kernel function under the condition that the first characteristic data output by the two samples in the sample pair at the same intermediate layer of the first neural network is processed by using the kernel function to obtain a second intermediate correlation degree, and obtaining a correlation vector based on the first intermediate correlation degree and the second intermediate correlation degree, the correlation vector is taken as the second correlation between two samples.
And 304, obtaining a correlation degree loss value of the second neural network based on the first correlation degree and the second correlation degree.
The first correlation can be used as supervision information to train the correlation among the samples learned by the second neural network. And learning the sample correlation obtained by the first neural network through the second neural network so as to enable the sample correlation obtained by the first neural network and the second neural network to be close. Alternatively, the difference between the first correlation and the second correlation may be determined based on the distance or similarity between the first correlation and the second correlation, and the correlation loss value of the second neural network may be obtained according to the difference, for example: 1 norm, 2 norm or other forms of norms, etc., and the embodiments of the present disclosure do not limit the manner in which the correlation loss value is obtained. In some embodiments, the first correlation degree and the second correlation degree are obtained from data output by a plurality of network layers in the first neural network and the second neural network, respectively, in this case, a plurality of correlation degree loss values may be obtained from the first correlation degree and the second correlation degree obtained by the corresponding network layer in the first neural network and the second neural network, and then the correlation degree loss value of the second neural network may be obtained from the plurality of correlation degree loss values, for example: the correlation loss values of the second neural network may be obtained by summing or averaging the correlation loss values, which is not limited in the embodiment of the disclosure.
In an alternative example, as shown in fig. 5, the first correlation includes three correlations obtained from the data output from the convolution modules c4 and c5 and the full connection layer fc in the first neural network, and the second correlation includes three correlations obtained from the data output from the bottleneck modules b4 and b6 and the full connection layer fc in the second neural network, so that three correlation loss values CC loss can be obtained from the first correlation and the second correlation, and the correlation loss values CC loss can be summed up to obtain the correlation loss value of the second neural network.
And 306, adjusting the network parameters of the second neural network according to the correlation degree loss value of the second neural network to obtain the target neural network.
A network loss value for the second neural network may be derived based on the correlation loss value. In some embodiments, the correlation loss value may be directly used as the network loss value of the second neural network. In other embodiments, a mission loss value is derived based on results of processing the samples by the second neural network, and a network loss value for the second neural network is determined based on the correlation loss value and the mission loss value. In other embodiments, a learning loss value for the second neural network is determined based on the results of the processing of the samples by the first neural network and the results of the processing of the samples by the second neural network, and a network loss value for the second neural network is determined based on the correlation loss value and the learning loss value. In other embodiments, a network loss value for the second neural network is determined based on the correlation loss value, the learning loss value, and the task loss value. Optionally, the network loss value of the second neural network may also contain other types of losses, which is not limited in this disclosure.
In an alternative example, the network parameters of the second neural network may be adjusted by a Stochastic Gradient Descent (SGD) method according to the network loss value of the second neural network, so that the difference between the correlations obtained by the second neural network and the first neural network for the same pair of samples is reduced. In some embodiments, when the second neural network reaches a preset network condition, the second neural network is determined to be the target neural network. The preset network condition may include at least one of a preset iteration number and a preset accuracy, and the embodiments of the present disclosure do not limit the type of the preset network condition and the implementation manner of adjusting the network parameter.
According to the embodiment of the disclosure, the loss value of the correlation degree of the second neural network is obtained according to the correlation degree of the sample in the feature space of the first neural network and the correlation degree of the sample in the feature space of the second neural network, the network parameter of the second neural network is adjusted according to the loss value of the correlation degree, and in the process of training the second neural network based on the processing result of the first neural network on the sample, the constraint on the correlation degree of the sample is added, so that the feature space of the target neural network obtained by the second neural network through training can be closer to the feature space of the first neural network, and the performance of the target neural network obtained through training is improved.
The following describes in detail a procedure of training a second neural network based on a first correlation between at least two samples when the method for training a neural network provided by an embodiment of the present invention is applied to knowledge distillation, with reference to an example shown in fig. 4.
A migration loss value is derived based on the results of the processing of the at least two samples by the first neural network and the results of the processing of the at least two samples by the second neural network 402.
In the embodiment of the present disclosure, feature extraction processing may be performed on each of the at least two samples through the first neural network, and data output by each sample in the last layer of the first neural network is selected and may be referred to as third feature data, which is used as a processing result of the first neural network on the at least two samples; the second neural network may perform feature extraction processing on each of the at least two samples, and data output by each sample in the last layer of the second neural network may be selected and referred to as fourth feature data, as a result of processing the at least two samples by the second neural network. A migration loss value for the second neural network may be derived based on the third characteristic data and the fourth characteristic data. In an alternative example, the migration loss value for the second neural network may include: obtaining a task loss value according to the fourth feature data (namely, a predicted value) and the labeling information (namely, a true value) of the sample, and obtaining a learning loss value according to the third feature data and the fourth feature data, for example: the task loss value and the learning loss value may be obtained by a cross entropy loss function, and the embodiment of the present disclosure does not limit the manner of obtaining the migration loss value of the second neural network. In an alternative example, as shown in fig. 5, the migration loss value KD loss of the second neural network is obtained from data obtained by subjecting data output by the fully-connected layer fc of the first neural network to a softmax function, and data obtained by subjecting data output by the fully-connected layer fc of the second neural network to a softmax function.
A second degree of correlation between the at least two samples is determined 404 by the second neural network.
In the disclosed embodiment, the operation 404 may refer to the description of the operation 302 in fig. 3, and therefore will not be described herein.
And 406, obtaining a correlation degree loss value of the second neural network based on the first correlation degree and the second correlation degree.
In the disclosed embodiment, the operation 406 may refer to the description of the operation 304 in fig. 3, and therefore will not be described herein.
And 408, adjusting the network parameters of the second neural network based on the correlation degree loss value and the migration loss value to obtain the target neural network.
In an alternative example, the correlation loss value and the migration loss value of the second neural network may be weighted and summed, and network parameters of the second neural network are adjusted by a Stochastic Gradient Descent (SGD) method and the like according to the weighted sum of the correlation loss value and the migration loss value, so that a difference between correlations obtained for the same pair of samples by the second neural network and the first neural network is reduced, and when the second neural network reaches a preset network condition, the second neural network is determined as the target neural network. The preset network condition may include at least one of a preset iteration number and a preset accuracy, and the embodiments of the present disclosure do not limit the type of the preset network condition and the implementation manner of adjusting the network parameter.
In the embodiment of the disclosure, the training of the second neural network is supervised by the first neural network, the knowledge of the first neural network can be migrated to the second neural network, so that the target neural network obtained by the training of the second neural network outputs the feature map similar to that of the first neural network, and by adding the correlation constraint of the sample in the training process, the features learned by the target neural network can have the differentiability consistent with that of the first neural network, thereby improving the performance of the target neural network obtained by training.
In an optional example, before determining the first correlation between at least two samples in the sample image set by the first neural network, the method may further include: and training the first neural network to obtain the operation of the trained first neural network. Optionally, the first neural network may be trained using the same sample as that used for training the second neural network, or may also be trained using a different sample from that used for training the second neural network, which is not limited in this disclosure.
In an alternative example, first, X ═ X (X) is determined from a given sample image set1,x2,...,xn) Based on its corresponding tag information y ═ y1,y2,...,yn) Training the neural network T, with the trained neural network T as the first neural network, for example, S can be usedGD or Adam, etc. to train the neural network T.
Then, respectively inputting the sample image set X into the trained neural network T and the neural network S to be trained, and according to each sample X in the sample image set XiFeature map f obtained in the last network layer of the neural network TTObtaining a feature map set FTFrom each sample X in the set of sample images XiFeature map f obtained at the last network layer in the neural network SSObtaining a feature map set FSThe neural network S is used as a second neural network, for example, the neural network T is a larger network, and the neural network S is a smaller network. For convenience of explanation, feature map set F is providedTAnd FSRespectively converted into matrix YTAnd YSIt can be expressed as follows:
YT=matrix(FT)∈Rn×d(formula 1)
YS=matrix(FS)∈Rn×d(formula 2)
Wherein, the matrix YTAnd YSEach row in (a) represents a sample xiR is the feature space, d is c × h × w, c is the number of channels of the feature map, h is the height of the feature map, w is the width of the feature map, and the values of c, h, and w may be different for the neural networks T and S.
According to matrix Y respectivelyTAnd YSCalculating the correlation degree of the sample image set X in the characteristic space of the neural networks T and S to obtain a correlation degree matrix k (Y)T,YT) And k (Y)S,YS) The calculation method is as follows:
[k(YT,YT)]ij=k(YTi·YTj·) (formula 3)
[k(YS,YS)]ij=k(YSi·YSj·) (formula 4)
Where k is a kernel function, i is 1 … … n, and j is 1 … … n, in order to capture the high-order correlation information, taylor series expansions are performed on equations 3 and 4 as follows:
wherein the coefficient αpAnd (4) related to the selected kernel function k, determining according to the kernel function k, and p is the order. Using radial basic kernel function k (x, y) ═ exp (-gamma | | | x-y | | | survival2) For example, equations 5 and 6 can be expressed as follows:
the parameter gamma is an empirical value, and is determined according to the requirements of users, the value range is generally between 0.04 and 1, p is generally 2, and the Taylor kernel function is expanded to 2 orders.
Then, training the neural network S based on the neural network T, and determining the migration loss value L of the neural network SKDThe following are:
wherein, PSIs the probability that the neural network S outputs,in order to be the first loss value,to be the second loss value, the coefficient lambdakdThe parameter tau is an integer and is determined according to the requirement of a user.
Obtaining the correlation degree constraint of the sample image set X in the feature space of the neural network according to the formulas 7 and 8, and expressing the constraint as a correlation degree loss value LCCThe correlation loss value LCCThe following were used:
combining equation 9 and equation 10, equation 11 is obtained as follows:
L=LKDccLCC(formula 11)
Wherein the coefficient lambdaccAs an empirical value, the correlation matrix k (Y) of the neural network S can be made by adding a correlation constraint in training the neural network S based on the neural network TS,YS) The correlation matrix k (Y) closer to the neural network TT,YT)。
As shown in fig. 6, fig. 6 is a schematic diagram of an embodiment of applying the method for training a neural network of the present disclosure to knowledge distillation, where samples in a sample image set are respectively input to a teacher network (teacher network) and a student network (student network) for processing, a correlation between feature maps of the samples obtained after the teacher network and the student network process the samples in the sample image set is determined, and a correlation constraint is added in a process of training the student network based on the teacher network according to the correlation, so that a feature space of the trained student network and a feature space of the teacher network have a consistent correlation.
As shown in fig. 7A to 7C, fig. 7A is a visualization result of a sample image set in a feature space of a teacher network, in the visualization result of fig. 7A, it can be seen that the sample image set exhibits a characteristic that a same class is separated from a different class in the feature space of the teacher network, fig. 7B is a visualization result of a sample image set in a feature space of a student network obtained by training a student network by using an existing knowledge distillation, in the visualization result of fig. 7B, it can be seen that a correlation degree of the sample image set in the feature space is not considered due to the existing knowledge distillation, and the correlation degree can reflect a difference of responses of the network to different samples, so that the sample image set does not exhibit the characteristic that the same class is separated from the different classes in the feature space of the trained student network, fig. 7C is a method for training a neural network applied to knowledge distillation, in the visualization result of fig. 7C, it can be seen that the sample image set exhibits the characteristic of closely clustering different classes of the same class consistent with the class of the teacher network in the feature space of the trained student network due to the addition of the correlation constraint of the sample image set in the feature space in the knowledge distillation.
In addition, some embodiments of the present disclosure also provide an image processing method, including: acquiring an image to be processed; and inputting the image to be processed into a target neural network for processing to obtain an image processing result, wherein the target neural network is obtained by training through the method of any one of the embodiments.
Fig. 8 is a schematic structural diagram of an apparatus for training a neural network according to some embodiments of the present disclosure. The apparatus may be disposed in a terminal device or a server, and may perform the method for training a neural network according to any of the above embodiments. As shown in fig. 8, the apparatus includes: a processing unit 810 and a training unit 820. Wherein,
a processing unit 810 for determining a first correlation between at least two samples in the sample image set by means of a first neural network.
A training unit 820, configured to train a second neural network based on a processing result of the first neural network on the at least two samples and a first correlation between the at least two samples, so as to obtain a target neural network.
In an embodiment of the disclosure, the network size of the first neural network is larger than the network size of the second neural network. The description of the processing unit 810 may refer to the description of the operation 102 in fig. 1, and the description of the training unit 820 may refer to the description of the operation 104 in fig. 1, and therefore, the description thereof is not repeated here.
Based on the device for training the neural network provided by the embodiment of the disclosure, a first correlation between at least two samples in a sample image set is determined through a first neural network, a second neural network with a smaller network scale than the first neural network is trained based on a processing result of the first neural network on the at least two samples and the first correlation between the at least two samples, a target neural network is obtained, the correlation of the samples in a feature space of the first neural network is effectively captured by utilizing the modeling of the correlation of the samples in the feature space, the feature space of the second neural network is constrained by utilizing the correlation of the samples in the feature space of the first neural network in the process of training the second neural network based on the processing result of the first neural network on the samples, and the features learned by the trained target neural network can have differentiability consistent with that of the first neural network, when the method is applied to knowledge distillation or simulation learning, the neural network with smaller scale can be trained better, and the performance of the neural network with smaller scale is improved.
Fig. 9 is a schematic structural diagram of a processing unit according to some embodiments of the present disclosure. As shown in fig. 9, the processing unit includes: a feature extraction sub-unit 910 and a first correlation degree determination sub-unit 920. Wherein,
the feature extraction subunit 910 is configured to perform feature extraction processing on each of the at least two samples through a first neural network, so as to obtain first feature data of each sample.
The first correlation determining subunit 920 is configured to obtain a first correlation between the two samples included in each sample pair according to first feature data of the two samples included in each sample pair in at least one sample pair formed by the at least two samples.
Optionally, the first feature data includes data output from at least one of the last layer and the middle layer of the first neural network, and the first correlation determination subunit 920 is configured to perform a non-linear mapping process on the first feature data of the first sample and the second sample included in the sample pair, so as to obtain a first correlation between the first sample and the second sample. The description about the feature extraction subunit 910 may refer to the description about the operation 202 in fig. 2, and the description about the first correlation degree determination subunit 920 may refer to the description about the operation 204 in fig. 2, and therefore, the description thereof is not repeated here.
According to the embodiment of the disclosure, through modeling the correlation of the characteristic data output by the sample in one or more layers of the first neural network, in the process of training the second neural network based on the processing result of the first neural network on the sample, the correlation constraint on the sample can be introduced into one or more layers of the second neural network, so that the multiple correlation constraints can be utilized, and the performance of the target neural network obtained through training is further improved.
Fig. 10 is a schematic structural diagram of a training unit according to some embodiments of the present disclosure. As shown in fig. 10, the training unit includes: a second degree of correlation determination subunit 1010, a first loss value determination subunit 1020, and a parameter value adjustment subunit 1030. Wherein,
a second degree of correlation determining subunit 1010, configured to determine a second degree of correlation between the at least two samples through a second neural network.
A first loss value determining subunit 1020, configured to obtain a correlation loss value of the second neural network based on the first correlation and the second correlation.
And a parameter value adjusting subunit 1030, configured to adjust a network parameter of the second neural network according to the correlation loss value of the second neural network, so as to obtain a target neural network.
In the embodiment of the present disclosure, the parameter value adjusting subunit 1030 is configured to perform adjustment processing on the network parameters of the second neural network according to the correlation loss value of the second neural network, so that a difference between the correlations obtained for the same pair of samples by the second neural network and the first neural network is reduced. The description about the second correlation degree determining subunit 1010 may refer to the description about the operation 302 in fig. 3, the description about the first loss value determining subunit 1020 may refer to the description about the operation 304 in fig. 3, and the description about the parameter value adjusting subunit 1030 may refer to the description about the operation 306 in fig. 3, and therefore will not be described here.
According to the embodiment of the disclosure, a correlation loss value of the second neural network is obtained according to the correlation of the sample in the feature space of the first neural network and the correlation in the feature space of the second neural network, the network parameters of the second neural network are adjusted according to the correlation loss value, and in the process of training the second neural network based on the processing result of the first neural network on the sample, correlation constraint on the sample is added, so that the feature space of the target neural network obtained by the second neural network through training can be closer to the feature space of the first neural network, and the performance of the target neural network obtained through training is improved.
Fig. 11 is a schematic structural diagram of a training unit according to other embodiments of the present disclosure. As shown in fig. 11, the training unit includes: a second degree of correlation determination subunit 1110, a first loss value determination subunit 1120, a parameter value adjustment subunit 1130, and a second loss value determination subunit 1140. Wherein,
a second loss value determining subunit 1140, configured to obtain a migration loss value based on the processing result of the first neural network for the at least two samples and the processing result of the second neural network for the at least two samples.
A second degree of correlation determining subunit 1110, configured to determine a second degree of correlation between the at least two samples through a second neural network.
A first loss value determining subunit 1120, configured to obtain a correlation loss value of the second neural network based on the first correlation and the second correlation.
And a parameter value adjusting subunit 1130, configured to adjust a network parameter of the second neural network based on the correlation loss value and the migration loss value, to obtain a target neural network.
In the embodiment of the present disclosure, the parameter value adjusting subunit 1130 is configured to perform an adjusting process on the network parameters of the second neural network according to the correlation loss value of the second neural network, so that a difference between the correlations obtained for the same pair of samples by the second neural network and the first neural network is reduced. The description about the second loss value determining subunit 1140 may refer to the description about the operation 402 in fig. 4, the description about the second correlation degree determining subunit 1110 may refer to the description about the operation 404 in fig. 4, the description about the first loss value determining subunit 1120 may refer to the description about the operation 406 in fig. 4, and the description about the parameter value adjusting subunit 1130 may refer to the description about the operation 408 in fig. 4, and therefore will not be described again here.
In the embodiment of the disclosure, the training of the second neural network is supervised by the first neural network, the knowledge of the first neural network can be migrated to the second neural network, so that the target neural network obtained by the training of the second neural network outputs the feature map similar to that of the first neural network, and by adding the correlation constraint of the sample in the training process, the features learned by the target neural network can have the differentiability consistent with that of the first neural network, thereby improving the performance of the target neural network obtained by training.
In addition, some embodiments of the present disclosure also provide an image processing apparatus including: the acquisition unit is used for acquiring an image to be processed; and the processing unit is used for inputting the image to be processed into a target neural network for processing to obtain an image processing result, wherein the target neural network is obtained by training through the device of any one of the embodiments.
The embodiment of the disclosure also provides an electronic device, which may be a mobile terminal, a Personal Computer (PC), a tablet computer, a server, and the like. Referring now to fig. 12, shown is a schematic diagram of an electronic device 1200 suitable for use in implementing a terminal device or server of an embodiment of the disclosure: as shown in fig. 12, the computer system 1200 includes one or more processors, communication sections, and the like, for example: one or more Central Processing Units (CPU)1201, and/or one or more acceleration units 1213, etc., the acceleration units 1213 may include, but are not limited to, a GPU, an FPGA, other types of special purpose processors, etc., and the processor may perform various appropriate actions and processes according to executable instructions stored in a Read Only Memory (ROM)1202 or loaded from a storage portion 1208 into a Random Access Memory (RAM) 1203. The communication unit 1212 may include, but is not limited to, a network card, which may include, but is not limited to, ib (infiniband) network card, and the processor may communicate with the read-only memory 1202 and/or the random access memory 1203 to execute the executable instructions, connect with the communication unit 1212 through the bus 1204, and communicate with other target devices through the communication unit 1212, thereby completing the operations corresponding to any one of the methods provided by the embodiments of the disclosure.
Further, in the RAM1203, various programs and data necessary for the operation of the device may also be stored. The CPU1201, ROM1202, and RAM1203 are connected to each other by a bus 1204. In the case of the RAM1203, the ROM1202 is an optional module. The RAM1203 stores or writes executable instructions into the ROM1202 at runtime, and the executable instructions cause the central processing unit 1201 to perform operations corresponding to the above-described communication methods. An input/output (I/O) interface 1205 is also connected to bus 1204. The communication unit 1212 may be integrated, or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and connected to the bus link.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
It should be noted that the architecture shown in fig. 12 is only an optional implementation manner, and in a specific practical process, the number and types of the components in fig. 12 may be selected, deleted, added or replaced according to actual needs; in different functional component settings, implementation manners such as a separate setting or an integrated setting may also be adopted, for example, the acceleration unit 1213 and the CPU1201 may be separately provided or the acceleration unit 1213 may be integrated on the CPU1201, the communication portion 1209 may be separately provided, or may be integrally provided on the CPU1201 or the acceleration unit 1213, and the like. These alternative embodiments are all within the scope of the present disclosure.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing a method as illustrated in the flow chart, the program code may include instructions corresponding to performing the method steps provided by embodiments of the present disclosure, e.g., determining a first degree of correlation between at least two samples in a sample image set by a first neural network; and training a second neural network based on the processing result of the first neural network on the at least two samples and a first correlation degree between the at least two samples to obtain a target neural network, wherein the network scale of the first neural network is larger than that of the second neural network. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 1201.
In one or more alternative embodiments, the disclosed embodiments also provide a computer program product for storing computer readable instructions that, when executed, cause a computer to perform the method of any of the possible implementations described above.
The computer program product may be embodied in hardware, software or a combination thereof. In an alternative example, the computer program product is embodied in a computer storage medium, and in another alternative example, the computer program product is embodied in a software product, such as a Software Development Kit (SDK), or the like.
In one or more optional implementation manners, the present disclosure also provides a method for training a neural network, and a corresponding apparatus and electronic device, a computer storage medium, a computer program, and a computer program product thereof, wherein the method includes: determining a first degree of correlation between at least two samples in the sample image set by a first neural network; and training a second neural network based on the processing result of the first neural network on the at least two samples and a first correlation degree between the at least two samples to obtain a target neural network, wherein the network scale of the first neural network is larger than that of the second neural network.
In some embodiments, the indication to train the neural network may be embodied as a call instruction, and the first device may instruct the second device to execute the neural network by calling, and accordingly, in response to receiving the call instruction, the second device may perform the steps and/or processes of any of the embodiments of the method to train the neural network.
It is to be understood that the terms "first," "second," and the like in the embodiments of the present disclosure are used for distinguishing and not limiting the embodiments of the present disclosure.
It is also understood that in the present disclosure, "plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in this disclosure is generally to be construed as one or more, unless explicitly stated otherwise or indicated to the contrary hereinafter.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
The methods and apparatus, devices of the present disclosure may be implemented in a number of ways. For example, the methods and apparatuses, devices of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (10)

1. A method of training a neural network, comprising:
determining a first degree of correlation between at least two samples in the sample image set by a first neural network;
and training a second neural network based on the processing result of the first neural network on the at least two samples and a first correlation degree between the at least two samples to obtain a target neural network, wherein the network scale of the first neural network is larger than that of the second neural network.
2. The method of claim 1, wherein determining a first degree of correlation between at least two samples in the sample image set by the first neural network comprises:
performing feature extraction processing on each sample of the at least two samples through the first neural network to obtain first feature data of each sample;
and obtaining a first correlation degree between the two samples included in each sample pair according to the first characteristic data of the two samples included in each sample pair of at least one sample pair formed by the at least two samples.
3. The method according to claim 2, wherein obtaining a first correlation between the two samples included in each sample pair of the at least two sample pairs according to the first characteristic data of the two samples included in each sample pair comprises:
and carrying out nonlinear mapping processing on first characteristic data of a first sample and a second sample included in the sample pair to obtain a first correlation between the first sample and the second sample.
4. The method according to any one of claims 1 to 3, wherein training a second neural network based on the processing result of the first neural network on the at least two samples and a first correlation between the at least two samples to obtain a target neural network comprises:
determining, by the second neural network, a second degree of correlation between the at least two samples;
obtaining a correlation degree loss value of the second neural network based on the first correlation degree and the second correlation degree;
and adjusting the network parameters of the second neural network according to the correlation degree loss value of the second neural network to obtain the target neural network.
5. An image processing method, comprising:
acquiring an image to be processed;
inputting the image to be processed into a target neural network for processing to obtain an image processing result, wherein the target neural network is obtained by training according to the method of any one of claims 1 to 4.
6. An apparatus for training a neural network, comprising:
the processing unit is used for determining a first correlation degree between at least two samples in the sample image set through a first neural network;
and the training unit is used for training a second neural network based on the processing result of the first neural network on the at least two samples and the first correlation between the at least two samples to obtain a target neural network, wherein the network scale of the first neural network is larger than that of the second neural network.
7. An image processing apparatus characterized by comprising:
the acquisition unit is used for acquiring an image to be processed;
a processing unit, configured to input the image to be processed into a target neural network for processing, so as to obtain an image processing result, where the target neural network is obtained by training the apparatus according to claim 6.
8. An electronic device, characterized in that it comprises the apparatus of claim 6 or 7.
9. An electronic device, comprising:
a memory for storing executable instructions; and
a processor for executing the executable instructions to perform the method of any one of claims 1 to 5.
10. A computer storage medium storing computer readable instructions that, when executed, implement the method of any one of claims 1 to 5.
CN201910095785.8A 2019-01-31 2019-01-31 Method, image processing method, device, equipment and the medium of training neural network Pending CN109800821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910095785.8A CN109800821A (en) 2019-01-31 2019-01-31 Method, image processing method, device, equipment and the medium of training neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910095785.8A CN109800821A (en) 2019-01-31 2019-01-31 Method, image processing method, device, equipment and the medium of training neural network

Publications (1)

Publication Number Publication Date
CN109800821A true CN109800821A (en) 2019-05-24

Family

ID=66559209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910095785.8A Pending CN109800821A (en) 2019-01-31 2019-01-31 Method, image processing method, device, equipment and the medium of training neural network

Country Status (1)

Country Link
CN (1) CN109800821A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263842A (en) * 2019-06-17 2019-09-20 北京影谱科技股份有限公司 For the neural network training method of target detection, device, equipment, medium
CN110458201A (en) * 2019-07-17 2019-11-15 北京科技大学 A kind of remote sensing image object-oriented classification method and sorter
CN110472681A (en) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 The neural metwork training scheme and image procossing scheme of knowledge based distillation
CN110598603A (en) * 2019-09-02 2019-12-20 深圳力维智联技术有限公司 Face recognition model acquisition method, device, equipment and medium
CN110909815A (en) * 2019-11-29 2020-03-24 深圳市商汤科技有限公司 Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN111382870A (en) * 2020-03-06 2020-07-07 商汤集团有限公司 Method and device for training neural network
CN111582101A (en) * 2020-04-28 2020-08-25 中国科学院空天信息创新研究院 Remote sensing image detection method and system
CN111598213A (en) * 2020-04-01 2020-08-28 北京迈格威科技有限公司 Network training method, data identification method, device, equipment and medium
CN111797737A (en) * 2020-06-22 2020-10-20 重庆高新区飞马创新研究院 Remote sensing target detection method and device
CN111814717A (en) * 2020-07-17 2020-10-23 腾讯科技(深圳)有限公司 Face recognition method and device and electronic equipment
CN111882048A (en) * 2020-09-28 2020-11-03 深圳追一科技有限公司 Neural network structure searching method and related equipment
CN111967597A (en) * 2020-08-18 2020-11-20 上海商汤临港智能科技有限公司 Neural network training and image classification method, device, storage medium and equipment
CN112052945A (en) * 2019-06-06 2020-12-08 北京地平线机器人技术研发有限公司 Neural network training method, neural network training device and electronic equipment
CN113378940A (en) * 2021-06-15 2021-09-10 北京市商汤科技开发有限公司 Neural network training method and device, computer equipment and storage medium
CN113487614A (en) * 2021-09-08 2021-10-08 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN113837396A (en) * 2021-09-26 2021-12-24 中国联合网络通信集团有限公司 Equipment simulation learning method based on B-M2M, MEC and storage medium
WO2022037165A1 (en) * 2020-08-21 2022-02-24 苏州浪潮智能科技有限公司 Method, system and device for performing knowledge distillation on neural network model, and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358293A (en) * 2017-06-15 2017-11-17 北京图森未来科技有限公司 A kind of neural network training method and device
CN108229673A (en) * 2016-12-27 2018-06-29 北京市商汤科技开发有限公司 Processing method, device and the electronic equipment of convolutional neural networks
US20180268265A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Recognition in unlabeled videos with domain adversarial learning and knowledge distillation
CN108830288A (en) * 2018-04-25 2018-11-16 北京市商汤科技开发有限公司 Image processing method, the training method of neural network, device, equipment and medium
CN108985190A (en) * 2018-06-28 2018-12-11 北京市商汤科技开发有限公司 Target identification method and device, electronic equipment, storage medium, program product

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229673A (en) * 2016-12-27 2018-06-29 北京市商汤科技开发有限公司 Processing method, device and the electronic equipment of convolutional neural networks
US20180268265A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Recognition in unlabeled videos with domain adversarial learning and knowledge distillation
CN107358293A (en) * 2017-06-15 2017-11-17 北京图森未来科技有限公司 A kind of neural network training method and device
CN108830288A (en) * 2018-04-25 2018-11-16 北京市商汤科技开发有限公司 Image processing method, the training method of neural network, device, equipment and medium
CN108985190A (en) * 2018-06-28 2018-12-11 北京市商汤科技开发有限公司 Target identification method and device, electronic equipment, storage medium, program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ADRIANA ROMERO ET AL.: ""FITNETS: HINTS FOR THIN DEEP NETS"", 《ICLR 2015》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052945A (en) * 2019-06-06 2020-12-08 北京地平线机器人技术研发有限公司 Neural network training method, neural network training device and electronic equipment
CN112052945B (en) * 2019-06-06 2024-04-16 北京地平线机器人技术研发有限公司 Neural network training method, neural network training device and electronic equipment
CN110263842B (en) * 2019-06-17 2022-04-05 北京影谱科技股份有限公司 Neural network training method, apparatus, device, and medium for target detection
CN110263842A (en) * 2019-06-17 2019-09-20 北京影谱科技股份有限公司 For the neural network training method of target detection, device, equipment, medium
CN110458201A (en) * 2019-07-17 2019-11-15 北京科技大学 A kind of remote sensing image object-oriented classification method and sorter
CN110472681A (en) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 The neural metwork training scheme and image procossing scheme of knowledge based distillation
CN110598603A (en) * 2019-09-02 2019-12-20 深圳力维智联技术有限公司 Face recognition model acquisition method, device, equipment and medium
CN110909815B (en) * 2019-11-29 2022-08-12 深圳市商汤科技有限公司 Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN110909815A (en) * 2019-11-29 2020-03-24 深圳市商汤科技有限公司 Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN111382870A (en) * 2020-03-06 2020-07-07 商汤集团有限公司 Method and device for training neural network
CN111598213B (en) * 2020-04-01 2024-01-23 北京迈格威科技有限公司 Network training method, data identification method, device, equipment and medium
CN111598213A (en) * 2020-04-01 2020-08-28 北京迈格威科技有限公司 Network training method, data identification method, device, equipment and medium
CN111582101B (en) * 2020-04-28 2021-10-01 中国科学院空天信息创新研究院 Remote sensing image target detection method and system based on lightweight distillation network
CN111582101A (en) * 2020-04-28 2020-08-25 中国科学院空天信息创新研究院 Remote sensing image detection method and system
CN111797737A (en) * 2020-06-22 2020-10-20 重庆高新区飞马创新研究院 Remote sensing target detection method and device
CN111814717B (en) * 2020-07-17 2022-09-27 腾讯科技(深圳)有限公司 Face recognition method and device and electronic equipment
CN111814717A (en) * 2020-07-17 2020-10-23 腾讯科技(深圳)有限公司 Face recognition method and device and electronic equipment
CN111967597A (en) * 2020-08-18 2020-11-20 上海商汤临港智能科技有限公司 Neural network training and image classification method, device, storage medium and equipment
WO2022037165A1 (en) * 2020-08-21 2022-02-24 苏州浪潮智能科技有限公司 Method, system and device for performing knowledge distillation on neural network model, and medium
CN111882048A (en) * 2020-09-28 2020-11-03 深圳追一科技有限公司 Neural network structure searching method and related equipment
CN113378940B (en) * 2021-06-15 2022-10-18 北京市商汤科技开发有限公司 Neural network training method and device, computer equipment and storage medium
CN113378940A (en) * 2021-06-15 2021-09-10 北京市商汤科技开发有限公司 Neural network training method and device, computer equipment and storage medium
CN113487614A (en) * 2021-09-08 2021-10-08 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN113837396A (en) * 2021-09-26 2021-12-24 中国联合网络通信集团有限公司 Equipment simulation learning method based on B-M2M, MEC and storage medium
CN113837396B (en) * 2021-09-26 2023-08-04 中国联合网络通信集团有限公司 B-M2M-based device simulation learning method, MEC and storage medium

Similar Documents

Publication Publication Date Title
CN109800821A (en) Method, image processing method, device, equipment and the medium of training neural network
CN111191791B (en) Picture classification method, device and equipment based on machine learning model
CN108460338B (en) Human body posture estimation method and apparatus, electronic device, storage medium, and program
US11227187B1 (en) Generating artificial intelligence solutions using raw data and simulated data
EP3779774B1 (en) Training method for image semantic segmentation model and server
US11693901B2 (en) Systems and methods for geolocation prediction
KR102318772B1 (en) Domain Separation Neural Networks
CN108399383B (en) Expression migration method, device storage medium, and program
CN108304775B (en) Remote sensing image recognition method and device, storage medium and electronic equipment
CN108280451B (en) Semantic segmentation and network training method and device, equipment and medium
CN110431560B (en) Target person searching method, device, equipment and medium
CN113139628B (en) Sample image identification method, device and equipment and readable storage medium
CN110929839B (en) Method and device for training neural network, electronic equipment and computer storage medium
CN113570064A (en) Method and system for performing predictions using a composite machine learning model
CN108230346B (en) Method and device for segmenting semantic features of image and electronic equipment
US11164306B2 (en) Visualization of inspection results
US20190087683A1 (en) Method and apparatus for outputting information
CN111325190A (en) Expression recognition method and device, computer equipment and readable storage medium
CN113011568A (en) Model training method, data processing method and equipment
CN114359289A (en) Image processing method and related device
CN114282258A (en) Screen capture data desensitization method and device, computer equipment and storage medium
CN113159315A (en) Neural network training method, data processing method and related equipment
CN112861601A (en) Method for generating confrontation sample and related equipment
CN117037244A (en) Face security detection method, device, computer equipment and storage medium
CN114049502B (en) Neural network training, feature extraction and data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190524

RJ01 Rejection of invention patent application after publication