CN111242303A

CN111242303A - Network training method and device, and image processing method and device

Info

Publication number: CN111242303A
Application number: CN202010038819.2A
Authority: CN
Inventors: 徐国栋; 刘子纬; 李晓潇; 吕健勤
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-06-05
Anticipated expiration: 2040-01-14
Also published as: CN111242303B

Abstract

The disclosure relates to a network training method and device and an image processing method and device. The network training method comprises the following steps: respectively inputting a sample image in a training set and a transformed image of the sample image into a first network for processing to obtain a first processing result of the sample image and a second processing result of the transformed image, wherein the transformed image is obtained by performing geometric transformation on the sample image; inputting the transformed image of the sample image into a pre-trained second network for processing to obtain a third processing result of the transformed image; and training the first network according to the labeling results and the first processing results of the plurality of sample images, and the second processing results and the third processing results of the transformed images of the plurality of sample images. The embodiment of the disclosure can improve the performance of the trained student network.

Description

Network training method and device, and image processing method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a network training method and apparatus, and an image processing method and apparatus.

Background

In a computer vision task, the performance of a network model is highly related to the scale of the network model, although a large model can realize good performance, the deployment of the large model at a mobile terminal with insufficient computing resources is limited by the excessive parameter quantity and the excessive reasoning time, and the problem of how to realize the effect of the large model by using a small model is mainly solved by model compression. The model compression method comprises model pruning, model quantification, knowledge distillation and the like, wherein the knowledge distillation is a common method for model compression by virtue of simple design and good effect. Knowledge distillation means that a teacher model is trained firstly, the teacher model is large in scale and good in performance, and then a small-scale student model is trained to enable the student model to imitate the teacher model, so that the performance of the student model is improved.

Disclosure of Invention

The present disclosure provides a technical scheme of network training and image processing.

According to an aspect of the present disclosure, there is provided a network training method, including: respectively inputting a sample image in a training set and a transformed image of the sample image into a first network for processing to obtain a first processing result of the sample image and a second processing result of the transformed image, wherein the transformed image is obtained by performing geometric transformation on the sample image; inputting the transformed image of the sample image into a pre-trained second network for processing to obtain a third processing result of the transformed image; and training the first network according to the labeling results and the first processing results of the plurality of sample images, and the second processing results and the third processing results of the transformed images of the plurality of sample images.

In one possible implementation manner, training the first network according to the labeling result and the first processing result of the plurality of sample images, and the second processing result and the third processing result of the transformed images of the plurality of sample images includes: determining a first loss of the first network according to the labeling results and the first processing results of the plurality of sample images; determining a second loss of the first network according to a second processing result and a third processing result of a transformed image of the plurality of sample images; training the first network according to the first loss and the second loss.

In one possible implementation, the method further includes: inputting the sample image into a pre-trained second network for processing to obtain a fourth processing result of the sample image;

the training the first network according to the labeling result and the first processing result of the plurality of sample images, and the second processing result and the third processing result of the transformed images of the plurality of sample images, further includes: determining a third loss of the first network according to the first processing result and the fourth processing result of the plurality of sample images; training the first network according to the first loss, the second loss, and the third loss.

In one possible implementation, training the first network according to the first loss, the second loss, and the third loss includes: determining a network loss of the first network based on a weighted sum of the first loss, the second loss, and the third loss; and adjusting the network parameters of the first network according to the network loss of the first network.

In one possible implementation, the second processing result comprises a first prediction probability distribution of a transformed image, the third processing result comprises a second prediction probability distribution of the transformed image,

the determining a second loss of the first network according to a second processing result and a third processing result of a transformed image of the plurality of sample images, comprising: determining a second loss of the first network based on a distance between the first predictive probability distribution and the second predictive probability distribution.

In one possible implementation, the method further includes: and performing geometric transformation on the sample image to obtain at least one transformed image of the sample image, wherein the category of the geometric transformation comprises at least one of rotation, distortion, stretching and compression.

According to an aspect of the present disclosure, there is provided an image processing method including: and inputting the image to be processed into a first network for processing to obtain an image classification result of the image to be processed, wherein the first network is obtained by training according to the network training method.

According to an aspect of the present disclosure, there is provided a network training apparatus, including: the first processing module is used for respectively inputting a sample image in a training set and a transformed image of the sample image into a first network for processing to obtain a first processing result of the sample image and a second processing result of the transformed image, wherein the transformed image is obtained by performing geometric transformation on the sample image; the second processing module is used for inputting the transformed image of the sample image into a pre-trained second network for processing to obtain a third processing result of the transformed image; and the training module is used for training the first network according to the labeling results and the first processing results of the plurality of sample images, and the second processing results and the third processing results of the transformed images of the plurality of sample images.

In one possible implementation, the training module includes: the first loss determining submodule is used for determining first loss of the first network according to the labeling results and the first processing results of the plurality of sample images; a second loss determining sub-module configured to determine a second loss of the first network according to a second processing result and a third processing result of a transformed image of the plurality of sample images; and the first training submodule is used for training the first network according to the first loss and the second loss.

In one possible implementation, the apparatus further includes: the third processing module is used for inputting the sample image into a pre-trained second network for processing to obtain a fourth processing result of the sample image; the training module further comprises: a third loss determining submodule, configured to determine a third loss of the first network according to the first processing result and the fourth processing result of the plurality of sample images; and the second training submodule is used for training the first network according to the first loss, the second loss and the third loss.

In one possible implementation, the second training submodule is configured to: determining a network loss of the first network based on a weighted sum of the first loss, the second loss, and the third loss; and adjusting the network parameters of the first network according to the network loss of the first network.

In one possible implementation, the second processing result includes a first prediction probability distribution of a transformed image, the third processing result includes a second prediction probability distribution of the transformed image, and the second loss determination sub-module is configured to: determining a second loss of the first network based on a distance between the first predictive probability distribution and the second predictive probability distribution.

In one possible implementation, the apparatus further includes: and the transformation module is used for carrying out geometric transformation on the sample image to obtain at least one transformed image of the sample image, wherein the category of the geometric transformation comprises at least one of rotation, distortion, stretching and compression.

According to an aspect of the present disclosure, there is provided an image processing apparatus including: and the image classification module is used for inputting the image to be processed into a first network for processing to obtain an image classification result of the image to be processed, wherein the first network is obtained by training according to the network training device.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, the labeling result of the sample and the processing result of the second network as the teacher network on the geometric transformation image of the sample can be simultaneously used as the supervision signal, so that the training of the first network as the student network is realized, deep information can be more fully mined from the teacher network, and the performance of the trained student network is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow diagram of a network training method according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of a training process of a network training method according to an embodiment of the present disclosure.

Fig. 3 shows a block diagram of a network training apparatus according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Fig. 5 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of a network training method according to an embodiment of the present disclosure, as shown in fig. 1, the network training method includes:

in step S11, a sample image in a training set and a transformed image of the sample image are input to a first network and processed, respectively, to obtain a first processing result of the sample image and a second processing result of the transformed image, where the transformed image is obtained by geometrically transforming the sample image;

in step S12, inputting a transformed image of the sample image into a pre-trained second network for processing, so as to obtain a third processing result of the transformed image;

in step S13, the first network is trained based on the labeling result and the first processing result of the plurality of sample images, and the second processing result and the third processing result of the converted images of the plurality of sample images.

In one possible implementation, the network training method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server.

For example, a teacher network (which may be referred to as a second network) with a larger scale may be trained in advance by using knowledge distillation, and a student network (which may be referred to as a first network) with a smaller scale may be trained by using the teacher network model. Wherein, the precision of the teacher network is higher than that of the trained student network. Compared with a student network, the teacher network may have more layers, extract more image features, or add some functional modules, so as to achieve higher precision than the student network. The differences between the teacher network and the student network may differ depending on the type of network model, and the present disclosure is not limited thereto.

In one possible implementation, the first network and the second network may be convolutional neural networks, including convolutional layers, pooling layers, full-link layers, and the like, and the disclosure does not limit the specific network structures of the first network and the second network.

In the training of the student network, in order to more fully mine deep-layer information from the teacher network, the sample image can be geometrically transformed to obtain a transformed image, and the prediction result of the teacher network on the transformed image is used as an additional supervision signal to improve the performance of the student network.

In one possible implementation, a training set may be preset, where the training set includes a plurality of sample images (e.g., photographs taken naturally, etc.), labeling results of the sample images, and transformed images of the sample images. The labeling result of the sample images can be, for example, manually labeled labels, and each sample image can correspond to at least one transformed image. A severe geometric transformation (e.g., a large angular rotation) may be performed on the sample image resulting in a transformed image. The present disclosure does not limit the specific geometric transformation manner and the number of transformed images corresponding to each sample image.

In one possible implementation manner, in step S11, any sample image and the transformed image of the sample image may be input into the first network, and the first processing result of the sample image and the second processing result of the transformed image may be output. When the transformed image of the sample image is multiple, multiple second processing results can be obtained accordingly.

In one possible implementation, in step S12, the transformed image of the sample image may be input into the second network, and the third processing result of the transformed image may be output. When the transformed image of the sample image is multiple, multiple third processing results can be obtained accordingly.

In one possible implementation manner, in step S13, according to the labeling result and the first processing result of a plurality of sample images (e.g., all or part of the sample images in the training set), the difference between the label of each sample image and the processing result of the first network may be determined; determining a difference between the processing result of the first network and the processing result of the second network according to a second processing result and a third processing result of a transformed image of the plurality of sample images; the first network is then trained on these differences.

According to the embodiment of the disclosure, the labeling result of the sample and the processing result of the geometric transformation image of the sample by the second network can be simultaneously used as the supervision signal, so that the training of the first network is realized, the deep information can be more fully mined from the teacher network, and the performance of the trained student network is improved.

In one possible implementation, before step S11, the method may further include: and performing geometric transformation on the sample image to obtain at least one transformed image of the sample image, wherein the category of the geometric transformation comprises at least one of rotation, distortion, stretching and compression.

For example, for any sample image in the training set, the sample image may be geometrically transformed to obtain at least one transformed image of the sample image. Wherein the category of the geometric transformation includes at least one of rotation, distortion, stretching, and compression. The geometric transformation of the same category may be performed on the sample image multiple times, different categories may be performed separately, or multiple categories may be performed simultaneously. For example, the sample image may be rotated by a plurality of angles to obtain a plurality of transformed images; or rotating the sample image to obtain a transformed image, and stretching the sample image to obtain another transformed image, thereby obtaining a plurality of transformed images; the sample image may also be simultaneously rotated and stretched to obtain one transformed image, and simultaneously rotated and compressed to obtain another transformed image, thereby obtaining multiple transformed images. The present disclosure does not limit the number of transformed images per sample image nor the specific manner of geometric transformation.

In one possible implementation, the step of geometrically transforming the sample image to obtain at least one transformed image of the sample image may include: and respectively rotating the sample image by 90 degrees, 180 degrees and 270 degrees to obtain three transformation images of the sample image. That is, the sample image may be rotated by a large angle, 90 °, 180 °, and 270 °, respectively, to obtain three transformed images of the sample image. The algorithm employed to implement the rotation is not limited by the present disclosure.

In this way, a transformed image for training can be obtained and the number of samples can be increased.

In one possible implementation, after obtaining the transformed image of each sample image in the training set, steps S11 and S12 may be performed multiple times to obtain a first processing result of the plurality of sample images, a second processing result of the transformed images of the plurality of sample images, and a third processing result.

In one possible implementation, step S13 may include:

determining a first loss of the first network according to the labeling results and the first processing results of the plurality of sample images;

determining a second loss of the first network according to a second processing result and a third processing result of a transformed image of the plurality of sample images;

training the first network according to the first loss and the second loss.

For example, a first loss of the first network, i.e., a tag loss L, may be determined according to the labeling result and the first processing result of the plurality of sample images_ce. The first loss may include, for example, a cross entropy loss function (softmax loss), and a person skilled in the art may set the loss function of the first loss according to actual conditions, and the present disclosure does not limit the type of the loss function of the first loss.

In one possible implementation, the root can beDetermining a second loss of the first network, i.e. a distillation loss L of the transformed image, from the second and third processing results of the transformed image of the plurality of sample images_t。

the determining a second loss of the first network from a second processing result and a third processing result of a transformed image of the plurality of sample images may comprise:

determining a second loss of the first network based on a distance between the first predictive probability distribution and the second predictive probability distribution.

For example, the processing result of the student network (i.e., the second processing result) and the processing result of the teacher network (i.e., the third processing result) may be represented as predicted probability distributions that objects (e.g., persons or objects) in the transformed image belong to a plurality of preset categories. The second loss of the first network needs to represent the distance between the predicted probability distribution of the student network (which may be referred to as the first predicted probability distribution) and the predicted probability distribution of the teacher network (which may be referred to as the second predicted probability distribution). Thus, the loss function for the second loss may employ the KL divergence in order to minimize the distance between the first predictive probability distribution and the second predictive probability distribution in the training. The person skilled in the art can set the loss function of the second loss according to practical situations, and the present disclosure does not limit the type of the loss function of the second loss.

For the transformed image, the loss of classification can not be calculated, namely, the student network is not required to classify the transformed image correctly, and the original learning process can be hindered by forcibly classifying the rotated image correctly because the convolutional neural network has no rotation invariance.

In one possible implementation manner, the first network may be trained according to the first loss and the second loss, for example, a weighted sum of the first loss and the second loss is used as a network loss, and a network parameter of the first network is adjusted through gradient back propagation, so that the trained first network is obtained when a training condition (for example, network convergence) is satisfied. The present disclosure does not limit the specific training process.

In this way, the training effect of the first network can be improved.

In one possible implementation, the method may further include: inputting the sample image into a pre-trained second network for processing to obtain a fourth processing result of the sample image;

step S13 includes: determining a third loss of the first network according to the first processing result and the fourth processing result of the plurality of sample images; training the first network according to the first loss, the second loss, and the third loss.

For example, the sample image may be input into the second network, the fourth processing result of the sample image may be output, and the output of the sample image by the second network may be simultaneously used as a supervision signal to implement the training of the first network.

In a possible implementation manner, the third loss of the first network, that is, the distillation loss L of the normal image, may be determined according to the first processing result and the fourth processing result of the plurality of sample images_kd. This third loss also needs to represent the distance between the processing outcome of the student network (i.e., the predicted probability distribution) and the processing outcome of the teacher network (i.e., the predicted probability distribution), so a KL divergence can be employed to minimize the distance between the two probability distributions during training. The person skilled in the art can set the loss function of the third loss according to practical situations, and the present disclosure does not limit the type of the loss function of the third loss.

In one possible implementation, the first network may be trained jointly based on the first loss, the second loss, and the third loss. In this way, the training effect of the first network can be further improved.

In one possible implementation, the step of training the first network according to the first loss, the second loss and the third loss may include:

determining a network loss of the first network based on a weighted sum of the first loss, the second loss, and the third loss; and adjusting the network parameters of the first network according to the network loss of the first network.

For example, one skilled in the art may set the weight of each loss based on different data sets (e.g., different types of sample images); and determining the weighted sum of the losses of the terms as the network Loss of the first network overall:

Loss＝w_ceL_ce+w_tL_t+w_kdL_kd(1)

in the formula (1), L_ce、L_t、L_kdCan represent the first loss, the second loss and the third loss respectively; w is a_ce、w_t、w_kdWeights may represent the first penalty, the second penalty, and the third penalty, respectively.

In one possible implementation, the network parameters of the first network may be adjusted by gradient back propagation based on the Loss. In case that a training condition (e.g., network convergence) is satisfied, a trained first network is obtained. The present disclosure does not limit the specific training process.

Fig. 2 shows a schematic diagram of a training process of a network training method according to an embodiment of the present disclosure. As shown in fig. 2, the sample image 21 is a plurality of sample images in the training set, and the transformed image 22 is an image obtained by performing large-angle rotation on the sample image 21. The sample images 21 are input to the first network 20, respectively, to obtain normal outputs 23 (i.e., first processing results of the respective sample images), and the transformed images 22 are input to the first network 20, respectively, to obtain transformed outputs 24 (i.e., second processing results of the respective transformed images).

From the distance between the annotation result (not shown) of the sample image and the normal output 23, the first loss L of the first network 20 can be derived_ceI.e. label loss. The third loss L of the first network 20 can be obtained according to the distance between the processing result of the plurality of sample images by the second network (not shown) and the normal output 23_kdI.e. loss of distillation of the normal image. Mapping a plurality of transformation graphs according to a second networkThe distance between the result of the image processing and the transformed output 24 yields the second loss L of the first network 20_tI.e. distillation loss of the transformed image. Thus, the first loss L can be passed_ceSecond loss L_tAnd a third loss L_kdTo train the first network 20.

According to an embodiment of the present disclosure, there is also provided an image processing method including: and inputting the image to be processed into a first network for processing to obtain an image classification result of the image to be processed, wherein the first network is obtained by training according to the network training method.

That is to say, the first network trained according to the method may be deployed, for example, deployed in a terminal device (e.g., a smartphone), and perform tasks such as object classification and face recognition. The image to be processed is processed through the first network, and a high-precision image classification result can be obtained.

According to the embodiment of the disclosure, the network training method based on the knowledge distillation of the image transformation consistency is realized, namely, the trained student network and teacher network give the same prediction result to the image which is applied with the same geometric transformation. According to the network training method disclosed by the embodiment of the disclosure, deep information is more fully mined from a teacher network, and the performance of a student network is improved; and make the student network more robust to environments with few samples and noisy samples.

Compared with other knowledge distillation methods, the method disclosed by the embodiment of the disclosure can obviously improve the performance of the student network aiming at the classification task; the method disclosed by the embodiment of the invention only matches the results of the network output layers of the teacher network and the student network, does not relate to matching of the characteristic diagram and the attention diagram of the network intermediate layer, and provides a larger space for the student network to search parameters suitable for the network structure of the student network, so that compared with other algorithms, the method has better performance when the structures of the teacher network and the student network are completely different. The method of the embodiment of the disclosure forces the student network to mine information of more teacher networks through the prediction consistency after constraint transformation, reduces the dependence on the label, and enables the student network to have better performance when the training samples are less or the training samples are noisy.

The method according to the embodiment of the disclosure can be applied to tasks such as object classification and face recognition, and replaces the prior knowledge distillation algorithm, so that the performance of a student network is greatly improved; or combined with the past algorithm, so that the performance of the student network is further improved.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a network training and image processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any network training and image processing method provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the method sections are not repeated.

Fig. 3 shows a block diagram of a network training apparatus according to an embodiment of the present disclosure, as shown in fig. 3, the network training apparatus includes:

a first processing module 31, configured to input a sample image in a training set and a transformed image of the sample image into a first network respectively for processing, so as to obtain a first processing result of the sample image and a second processing result of the transformed image, where the transformed image is obtained by performing geometric transformation on the sample image;

a second processing module 32, configured to input a transformed image of the sample image into a pre-trained second network for processing, so as to obtain a third processing result of the transformed image;

the training module 33 is configured to train the first network according to the labeling result and the first processing result of the plurality of sample images, and the second processing result and the third processing result of the transformed images of the plurality of sample images.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the network training method provided in any one of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the network training method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 4 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 4, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 5 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 5, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of network training, comprising:

respectively inputting a sample image in a training set and a transformed image of the sample image into a first network for processing to obtain a first processing result of the sample image and a second processing result of the transformed image, wherein the transformed image is obtained by performing geometric transformation on the sample image;

inputting the transformed image of the sample image into a pre-trained second network for processing to obtain a third processing result of the transformed image;

and training the first network according to the labeling results and the first processing results of the plurality of sample images, and the second processing results and the third processing results of the transformed images of the plurality of sample images.

2. The method of claim 1, wherein training the first network according to the labeling result and the first processing result of the plurality of sample images, the second processing result and the third processing result of the transformed images of the plurality of sample images comprises:

training the first network according to the first loss and the second loss.

3. The method of claim 2, further comprising:

inputting the sample image into a pre-trained second network for processing to obtain a fourth processing result of the sample image;

the training the first network according to the labeling result and the first processing result of the plurality of sample images, and the second processing result and the third processing result of the transformed images of the plurality of sample images, further includes:

determining a third loss of the first network according to the first processing result and the fourth processing result of the plurality of sample images;

training the first network according to the first loss, the second loss, and the third loss.

4. The method of claim 3, wherein training the first network based on the first loss, the second loss, and the third loss comprises:

determining a network loss of the first network based on a weighted sum of the first loss, the second loss, and the third loss;

and adjusting the network parameters of the first network according to the network loss of the first network.

5. The method according to any of claims 2-4, wherein the second processing result comprises a first prediction probability distribution of a transformed image, the third processing result comprises a second prediction probability distribution of the transformed image,

the determining a second loss of the first network according to a second processing result and a third processing result of a transformed image of the plurality of sample images, comprising:

6. The method according to any one of claims 1-5, further comprising:

and performing geometric transformation on the sample image to obtain at least one transformed image of the sample image, wherein the category of the geometric transformation comprises at least one of rotation, distortion, stretching and compression.

7. An image processing method, comprising:

inputting an image to be processed into a first network for processing, and obtaining an image classification result of the image to be processed, wherein the first network is obtained by training according to the network training method of any one of claims 1 to 6.

8. A network training apparatus, comprising:

the first processing module is used for respectively inputting a sample image in a training set and a transformed image of the sample image into a first network for processing to obtain a first processing result of the sample image and a second processing result of the transformed image, wherein the transformed image is obtained by performing geometric transformation on the sample image;

the second processing module is used for inputting the transformed image of the sample image into a pre-trained second network for processing to obtain a third processing result of the transformed image;

and the training module is used for training the first network according to the labeling results and the first processing results of the plurality of sample images, and the second processing results and the third processing results of the transformed images of the plurality of sample images.

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 7.