CN113537491A

CN113537491A - Neural network training method, device and computer readable medium

Info

Publication number: CN113537491A
Application number: CN202110813756.8A
Authority: CN
Inventors: 姚广; 苏仲岳; 闫正
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-10-22

Abstract

The utility model provides a neural network training scheme, this scheme can train first neural network at first, as the preceding network, and fix after training the parameter of first neural network, and second neural network is as the network of follow-up processing, including decoder and encoder, can be first with second decoding characteristic information, third decoding characteristic information and fourth decoding characteristic information approach respectively second characteristic information, third characteristic information and fourth characteristic information train second neural network as training the target, then fix the parameter of wherein encoder, retrain the encoder of second neural network, update the parameter of encoder, thereby accomplish the training of whole neural network. The trained first neural network and the trained second neural network can be used as an integral neural network, so that the directional information in the image is effectively extracted, and classification is finished by utilizing the directional information.

Description

Neural network training method, device and computer readable medium

Technical Field

The present application relates to the field of information technology, and in particular, to a neural network training method, device, and computer readable medium.

Background

Metric Learning (Metric Learning), a commonly used machine Learning method in face recognition, is proposed by Eric Xing in NIPS 2002, and can learn a feature (Embedding) space in which all data are converted into a feature vector, and the distance between feature vectors of dying samples is small, and the distance between feature vectors of dissimilar samples is large, so as to distinguish the data. Metric Learning, also known as Distance Metric Learning (DML) or similarity Learning, is now commonly used in many fields, such as image object detection, image classification, object tracking, face recognition, data classification, etc. When applied to image classification, at present, no mature scheme is available for extracting directional information in an image and finishing classification by using the directional information.

Disclosure of Invention

It is an object of the present application to provide a neural network training scheme for extracting directional information in images and using the directional information to complete image classification.

To achieve the above object, the present application provides a neural network training method, including:

training a first neural network, and fixing parameters of the first neural network after the training is finished;

performing fractal processing on a training picture to obtain an input picture of the training picture comprising four rotation angles, inputting the input picture into the first neural network, and obtaining first feature information, second feature information, third feature information and fourth feature information, wherein the four rotation angles are respectively 0 degrees, 90 degrees, 180 degrees and 270 degrees, the first feature information represents the image feature of the training picture with the rotation angle of 0 degree, the second feature information represents the image feature of the training picture with the rotation angle of 90 degrees, the third feature information represents the image feature of the training picture with the rotation angle of 180 degrees, and the fourth feature information represents the image feature of the training picture with the rotation angle of 270 degrees;

inputting an input picture into an encoder of a second neural network for encoding to obtain encoding characteristic information, and inputting the encoding characteristic information and the first characteristic information into a decoder of the second neural network for decoding to obtain second decoding characteristic information, third decoding characteristic information and fourth decoding characteristic information;

training the second neural network according to second decoding characteristic information, third decoding characteristic information and fourth decoding characteristic information and the second characteristic information, the third characteristic information and the fourth characteristic information to update parameters of the second neural network, so that the second decoding characteristic information, the third decoding characteristic information and the fourth decoding characteristic information respectively approach to the second characteristic information, the third characteristic information and the fourth characteristic information;

and fixing the parameters of the encoder, and training the encoder of the second neural network to update the parameters of the encoder.

Further, the encoder adopts a convolutional neural network with the network depth smaller than a preset value, and the decoder comprises a full-link layer and a normalization layer.

Further, training an encoder of the second neural network according to second, third and fourth decoding feature information and second, third and fourth feature information to update parameters of the encoder so that the second, third and fourth decoding feature information approximate the second, third and fourth feature information, respectively, includes:

updating parameters of the neural network by using a mean square loss function, so that differences between the second decoding characteristic information, the third decoding characteristic information and the fourth decoding characteristic information and the second characteristic information, the third characteristic information and the fourth characteristic information are smaller than a preset value.

Further, training the first neural network comprises:

performing fractal processing on a training picture to obtain an input picture of the training picture comprising four rotation angles, wherein the four rotation angles are respectively 0 degree, 90 degrees, 180 degrees and 270 degrees;

inputting the input picture into the neural network for forward propagation, and acquiring first characteristic information, second characteristic information, third characteristic information and fourth characteristic information corresponding to the input picture, wherein the first characteristic information represents the image characteristic of a training picture with a rotation angle of 0 degrees, the second characteristic information represents the image characteristic of the training picture with a rotation angle of 90 degrees, the third characteristic information represents the image characteristic of the training picture with a rotation angle of 180 degrees, and the fourth characteristic information represents the image characteristic of the training picture with a rotation angle of 270 degrees;

calculating loss function values corresponding to different rotation angles according to the output characteristic information, calculating gradients according to the loss function values, and performing back propagation to update parameters of the neural network;

repeating the forward propagation and the backward propagation until the neural network converges.

Further, performing fractal processing on a training picture to obtain an input picture of the training picture including four rotation angles, where the four rotation angles are 0 °, 90 °, 180 °, and 270 °, respectively, and the method includes:

respectively rotating the same training picture by 90 degrees, 180 degrees and 270 degrees to obtain the training pictures with the rotation angles of 90 degrees, 180 degrees and 270 degrees;

combining the training pictures which are not rotated with the training pictures with the rotation angles of 90 degrees, 180 degrees and 270 degrees to obtain the input pictures of the training pictures with four rotation angles, wherein the four rotation angles are respectively 0 degrees, 90 degrees, 180 degrees and 270 degrees.

Further, combining the training picture which is not rotated with the training pictures with the rotation angles of 90 °, 180 ° and 270 ° to obtain an input picture of the training picture including four rotation angles, which are 0 °, 90 °, 180 °, and 270 °, respectively, includes:

combining the training pictures which are not rotated with the training pictures with the rotation angles of 90 degrees, 180 degrees and 270 degrees into a picture matrix according to the sequence of [ [ p1, p2], [ p4, p3] ] and taking the picture matrix as an input picture of the training pictures with four rotation angles, wherein p1, p2, p3 and p4 are the training pictures with the rotation angles of 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively.

Further, the structure of the neural network comprises a backbone network, a full connection layer and a standardization layer;

inputting the input picture into the neural network for forward propagation, and extracting output characteristic information corresponding to the input picture, wherein the output characteristic information comprises:

inputting the input picture into a backbone network, and outputting N-dimensional characteristic information;

inputting N-dimensional feature information into the full-connection layer and the normalization layer, and outputting four groups of M-dimensional feature information, wherein each group of M-dimensional feature information is respectively a first feature information, a second feature information, a third feature information and a fourth feature information in the input feature information.

Further, calculating loss function values corresponding to different rotation angles according to the output characteristic information, calculating gradients according to the loss function values, and performing back propagation to update parameters of the neural network, including:

calculating a first loss function value, a second loss function value, a third loss function value and a fourth loss function value according to the first characteristic information, the second characteristic information, the third characteristic information and the fourth characteristic information respectively;

and calculating the average value of the first loss function value, the second loss function value, the third loss function value and the fourth loss function value, calculating a gradient according to the average value, and performing back propagation to update the parameters of the neural network.

Based on another aspect of the application, there is also provided a computing device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, cause the device to perform the steps of the neural network training method.

In addition, a computer readable medium is provided, on which computer readable instructions are stored, and the computer readable instructions can be executed by a processor to implement the steps of the neural network training method.

Compared with the prior art, in the neural network training scheme provided by the application, the first neural network can be trained to serve as a pre-network, parameters of the first neural network are fixed after training is completed, and after an input picture obtained after fractal processing is input into the first neural network, four groups of characteristic information containing directional information can be extracted. And inputting an input picture into an encoder of a second neural network for encoding, after encoding characteristic information is obtained, inputting the encoding characteristic information and the first characteristic information into a decoder of the second neural network for decoding, obtaining second decoding characteristic information, third decoding characteristic information and fourth decoding characteristic information, and training the second neural network by using the encoding characteristic information and the first characteristic information, wherein the training target at the moment is to enable the second decoding characteristic information, the third decoding characteristic information and the fourth decoding characteristic information to respectively approach the second characteristic information, the third characteristic information and the fourth characteristic information, parameters of the encoder can be fixed after the training is finished, and then the encoder of the second neural network is trained, and the parameters of the encoder are updated, so that the training of the whole neural network is finished. The trained first neural network and the trained second neural network can be used as an integral neural network, so that the directional information in the image is effectively extracted, and classification is finished by utilizing the directional information.

Drawings

Fig. 1 is a training flowchart of a neural network training method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a training process performed on a first neural network according to an embodiment of the present disclosure;

FIG. 3 is an un-rotated training picture used in an embodiment of the present application;

FIG. 4 is a training picture rotated 90 clockwise as used in the examples of the present application;

FIG. 5 is a training picture rotated 180 clockwise as used in the examples of the present application;

FIG. 6 is a training picture rotated 270 clockwise as used in the examples of the present application;

FIG. 7 is an input image composed of training images at various rotation angles according to an embodiment of the present application;

FIG. 8 is a schematic diagram of the overall structure of a neural network used in the embodiment of the present application;

FIG. 9 is a detailed structural diagram of a neural network used in an embodiment of the present application;

FIG. 10 is a schematic diagram of a process of back propagation in an embodiment of the present application;

FIG. 11 is a diagram illustrating an actual feature vector used in calculating the loss function value according to an embodiment of the present application;

the same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In a typical configuration of the present application, the terminal, the devices serving the network each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The embodiment of the application provides a neural network training method, which comprises the steps of firstly training a first neural network to serve as a front-end network, fixing parameters of the first neural network after training is finished, inputting an input picture obtained after fractal processing into the first neural network, and then extracting four groups of characteristic information including directional information. The second neural network is used as a network for subsequent processing and comprises a decoder and an encoder, after an input picture is input into the encoder of the second neural network for encoding and encoding characteristic information is obtained, the encoding characteristic information and the first characteristic information may be input into a decoder of the second neural network to be decoded to obtain second decoding characteristic information, third decoding characteristic information and fourth decoding characteristic information, the second neural network may be trained with this, and the training target at this time is to make the second decoded feature information, the third decoded feature information, and the fourth decoded feature information approach the second feature information, the third feature information, and the fourth feature information, respectively, and after the training is completed, the parameters of the encoder may be fixed, and then the encoder of the second neural network is trained, and the parameters of the encoder are updated, thereby completing the training of the entire neural network. The trained first neural network and the trained second neural network can be used as an integral neural network, so that the directional information in the image is effectively extracted, and classification is finished by utilizing the directional information.

The execution subject of the method may be user equipment, network equipment, or a device formed by integrating the user equipment and the network equipment through a network, or may also be an application program running on the device. The user equipment comprises but is not limited to various terminal equipment such as a computer, a mobile phone and a tablet computer; including but not limited to implementations such as a network host, a single network server, multiple sets of network servers, or a cloud-computing-based collection of computers. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, one virtual computer consisting of a collection of loosely coupled computers.

Fig. 1 illustrates a training flow of a neural network training method provided in an embodiment of the present application, which may include the following processing steps:

step S101, training a first neural network, and fixing parameters of the first neural network after the training is finished. The first neural network is used as a pre-network in the present scheme, and can be used to extract directional information of a picture, and a training process of the first neural network is shown in fig. 2, and includes:

step S201, inputting the input picture into the first neural network for forward propagation, and extracting first feature information, second feature information, third feature information, and fourth feature information corresponding to the input picture.

In the embodiment of the application, when fractal processing is performed, the four selected rotation angles are 0 °, 90 °, 180 °, and 270 °, respectively, where the training picture with the rotation angle of 0 ° refers to a training picture that has not undergone rotation processing, that is, an initial training picture, and the training pictures with the rotation angles of 90 °, 180 °, and 270 ° refer to training pictures that have undergone rotation processing at corresponding angles, and the directions of rotation in the same application scenario are uniform, and the training pictures are rotated by 90 °, 180 °, and 270 ° in clockwise or counterclockwise directions, thereby obtaining training pictures in the directions.

The input picture includes the training pictures of the four rotation angles, for example, in an actual scene, the input picture may be represented in a form of a picture matrix, and a specific form of the picture matrix may be that the training pictures of the four rotation angles are arranged according to a certain sequence. For example, when the picture matrix takes a 1 × 4 arrangement form, the input picture may be represented as a picture matrix of [ p1, p2, p3, p4], where p1, p2, p3, and p4 are training pictures rotated by 0 °, 90 °, 180 °, and 270 °, respectively. And when the picture matrix takes a 2 × 2 arrangement form, the input picture can be represented as a picture matrix of [ [ p1, p2], [ p4, p3 ].

In some embodiments of the present application, the specific process of the fractal processing may include the following steps:

first, the same training picture is rotated by 90 °, 180 °, and 270 °, respectively, to obtain training pictures rotated by 90 °, 180 °, and 270 °. For example, taking the training picture shown in fig. 2 as an example, the rotation thereof by 90 °, 180 ° and 270 ° clockwise is shown in fig. 3, 4 and 5, respectively.

Then, the training pictures (i.e., fig. 3) that are not rotated are combined with the training pictures (fig. 4 to 6) that are rotated by 90 °, 180 °, and 270 °, so as to obtain the input pictures of the training pictures that include the above four rotation angles. If the input picture is in the form of a picture matrix and the picture matrix is in the 2 × 2 arrangement, the training pictures that are not rotated and the training pictures that are rotated by 90 °, 180 °, and 270 ° can be combined into a picture matrix in the order of [ [ p1, p2], [ p4, p3], and the input picture shown in fig. 7 can be obtained. The input picture has at least two characteristics: a, the picture matrix contains a plurality of angles of the same training picture, so that the first neural network can at least extract directional information of each orthogonal angle from the input picture. b, the picture matrix has right-angle rotation without deformation, and the picture matrix is unchanged no matter how the picture matrix is rotated at four angles of 0 degrees, 90 degrees, 180 degrees and 270 degrees.

The first neural network may output 4 sets of feature information corresponding to the four rotation angles, including first feature information, second feature information, third feature information, and fourth feature information, where the first feature information represents an image feature of a training picture rotated by 0 °, the second feature information represents an image feature of a training picture rotated by 90 °, the third feature information represents an image feature of a training picture rotated by 180 °, and the fourth feature information represents an image feature of a training picture rotated by 270 °.

The structure of the first neural network used in the embodiment of the present application may include a backbone network, a full connection layer, and a normalization layer, and may be specifically as shown in fig. 8. The backbone network may adopt a BN-initiation network, as shown in 710 in fig. 8, and after an input picture is input into the BN-initiation network, N-dimensional feature information may be output, where the dimension of the information may be set according to the needs of an actual scene, for example, in this embodiment, N may be set to 2048, that is, after the input picture is processed by the BN-initiation network, 2048-dimensional feature information is output. After obtaining the N-dimensional feature information output by the BN-acceptance network, the N-dimensional feature information may be input into the fully-connected layer 720 and the normalization layer 730, and four sets of M-dimensional feature information may be output. Similarly, the specific value of M may be determined according to the needs of an actual scene, for example, M in this embodiment may be 128, so that four sets of 128-dimensional feature information are finally output after passing through the fully-connected layer and the normalization layer.

Specifically, as the network structure shown in fig. 9 is taken as an example, the fully-connected layers 710 may respectively include 4 fully-connected units, fc1, fc2, fc3 and fc4, and the normalization layer may include 4 normalization units, norm1, norm2, norm3 and norm 4. Therefore, after the N-dimensional feature information output by the BN-acceptance network is obtained, the N-dimensional feature information may be input into each full-connection unit in the full-connection layer to output four sets of M-dimensional feature information, and then each set of M-dimensional feature information may be input into a corresponding standardization unit in the standardization layer to output the first feature information, the second feature information, the third feature information, and the fourth feature information, respectively. Specifically, it can also be expressed as:

E0＝Head0{Network(Img0)}

E90＝Head90{Network(Img90)}

E180＝Head180{Network(Img180)}

E270＝Head270{Network(Img270)}

the Network represents a backbone Network for extracting image features, the Head represents a full-link layer and a standardized layer corresponding to each rotation angle, for example, Head0 is fc1+ norm1, Head90 is fc2+ norm2, Head180 is fc3+ norm3, Head270 is fc4+ norm4, and Img0, Img90, Img180 and Img270 are training pictures of four rotation angles respectively.

Taking the scenario in which the value of N is 2048 and the value of M is 128 as an example, after 2048-dimensional feature information output by the BN-acceptance network is obtained, the feature information is input to a full-connection unit fc1 in the full-connection layer 720 to obtain 128-dimensional feature information, then the 128-dimensional feature information is continuously input to a corresponding normalization unit norm1 in the normalization layer 730, and the 128-dimensional feature information after normalization processing is output, that is, the 128-dimensional first feature information. Similarly, the second feature information, the third feature information and the fourth feature information may be output based on other fully-connected units of the fully-connected layer and other normalization units of the normalization layer, the dimensions of the feature information are 128 dimensions, and in an actual scenario, the feature information may also be combined into one 512-dimensional feature information output.

In consideration of the memory limitation during data processing in the actual processing process, all data cannot be loaded at one time for processing, so the size of data input to the first neural network for processing each time may be set to (BS,12,224,224). Wherein the BS is the size of the batch and is a preset super parameter. 12 denotes the number of channels of input data, from 1 to 12, one rotation angle for every 3 channels. Thus, the input data may first be split into 4 fractional data of size (BS,3,224,224), each fractional data representing one rotation angle of the same training picture. Then, the 4 pieces of data can be combined in the first dimension to obtain data with the size of (4 · BS,3,224,224), the data is input into the first neural network to be propagated forwards, and after sequentially processing through the backbone network, the full connection layer and the normalization layer, output characteristic information with the size of (4 · BS,512) is output. Since the feature information is combined from the outputs of the four heads, the 512-dimensional output feature information includes feature information corresponding to different rotation angles, wherein 1-128 represents 0 degree feature, 129-256 represents 90 degree feature, 257-384 represents 180 degree feature, and 385-512 represents 270 degree feature.

In an actual scene, it may be considered that the feature information of the picture may include directional information and content information, where the directional information may represent a rotation angle of the picture, and the content information may represent pixel content included in the picture, and content information of the same training picture at different rotation angles is the same, then the first feature information, the second feature information, the third feature information, and the fourth feature information may also be represented as:

E0＝Func(C,D0)

E90＝Func(C,D90)

E180＝Func(C,D180)

E270＝Func(C,D270)

where Func denotes a certain functional relationship, C denotes content information, and D0, D90, D180, and D270 denote directionality information of training pictures at different rotation angles.

Step S202, calculating loss function values corresponding to different rotation angles according to the output characteristic information, calculating a gradient according to the loss function values, and performing reverse propagation to update parameters of the first neural network.

The output characteristic information comprises first characteristic information, second characteristic information, third characteristic information and fourth characteristic information which respectively correspond to different rotation angles. Therefore, the process of back propagation may be as shown in fig. 10, after extracting the output characteristic information according to the aforementioned first neural network, the first loss function value loss1, the second loss function value loss2, the third loss function value loss3 and the fourth loss function value loss4 may be calculated according to the first characteristic information, the second characteristic information, the third characteristic information and the fourth characteristic information, respectively. In this embodiment, the loss function used for calculating the loss function value may be an MS-loss (multiple-Similarity loss) function. After obtaining the respective loss function values, an average loss _ avg of the first, second, third and fourth loss function values may be calculated, and a gradient calculated from the average, back-propagated to update the parameters of the first neural network.

Still taking the foregoing scenario as an example, when the size of the output feature information is (4 · BS,512), and the output feature information is reversely propagated, the obtained output feature information is split into 4 pieces of sub-feature information (d1, d2, d3, d4) with the size of (BS,512), each piece of sub-feature information corresponding to an image feature extracted by the first neural network from a training picture of one rotation angle. In actual calculation, the sub-feature information may be expressed in a matrix form, for example, fig. 11 shows an output feature information matrix when the BS takes a value of 2, where each two columns in the matrix correspond to sub-feature information of one rotation angle, and in calculating output feature information not belonging to the rotation angle, values in other sub-feature information may all be set to 0, so that the output information corresponding to each rotation angle is related to only the shaded portion in fig. 11. Therefore, the sub-self-feature information is put into the MS-loss function to calculate the loss function value, so that four corresponding loss function values (loss1, loss2, loss3 and loss4) can be obtained, then the gradient is calculated by using the average value of the four loss function values, and the back propagation is carried out, so that the updating of the first neural network parameter can be completed.

After the update is completed, the forward propagation and backward propagation processes (i.e., steps S201 and S202) are repeated until the first neural network converges, and the training of the first neural network can be completed. At this time, the parameters of the first neural network are fixed, and the subsequent training of the second neural network is performed.

Step S102, performing fractal processing on a training picture, acquiring an input picture of the training picture including four rotation angles, inputting the input picture into the first neural network, and acquiring first characteristic information, second characteristic information, third characteristic information and fourth characteristic information. The first neural network at this time is a neural network which has been trained and has fixed parameters, and is used as a pre-network of a subsequent second neural network to extract feature information including directional information.

Step S103, inputting the input picture into a coder of a second neural network for coding, acquiring coding characteristic information, inputting the coding characteristic information and the first characteristic information into a decoder of the second neural network for decoding, and acquiring second decoding characteristic information, third decoding characteristic information and fourth decoding characteristic information. Wherein, the second neural network can be expressed in the following way:

Mencoder:Em＝Mencoder(Img_c)

Mdecoder:Dm,E90～,E180～,E270～＝Mdecoder(Em,E0)

the Mencoder and the Mdecoder respectively represent an encoder and a decoder of the second neural network M, the encoder may adopt a convolutional neural network with a network depth smaller than a preset value, for example, Resnet18 and the like may be used in the embodiment, a smaller network may also be used in an actual scene, and the decoder may be composed of different heads, including a full connection layer and a normalization layer. Img _ c represents an input picture, and may be represented as a percentage (Img0, Img90, Img180, Img270), i.e., a training picture including four rotation angles. Em is coding characteristic information output by the encoder, and Dm, E90, E180-E270-represent several branch outputs of the decoder, wherein E90, E180-E270-E are second decoding characteristic information, third decoding characteristic information and fourth decoding characteristic information.

Step S104, training the second neural network according to second decoding characteristic information, third decoding characteristic information, fourth decoding characteristic information, second characteristic information, third characteristic information and fourth characteristic information to update parameters of the second neural network, so that the second decoding characteristic information, the third decoding characteristic information and the fourth decoding characteristic information respectively approach to the second characteristic information, the third characteristic information and the fourth characteristic information.

In training the second neural network, two phases can be divided. In the first stage, the parameters of the first neural network are fixed, the second neural network is trained, the training target is to make the second decoded feature information E90 ~, the third decoded feature information E180 ~ and the fourth decoded feature information E270 ~ approach to the second feature information E90, the third feature information E180 and the fourth feature information E270, respectively, and can be represented in the following form:

E90～→E90

E180～→E180

E270～→E270

wherein "→" indicates a training target. In practical scenarios, the three decoded feature information may be approximated with E90, E180, and E180 using a mean square loss function (MSE loss). Specifically, the judgment as to whether or not to approach may be determined by judging whether or not the differences between the second decoded feature information, the third decoded feature information, and the fourth decoded feature information and the second feature information, the third feature information, and the fourth feature information, respectively, are smaller than a preset value.

After the first stage training is completed, the second stage training can be performed. That is, in step S105, after the parameters of the encoder need to be fixed, only the decoder of the second neural network is trained to update the parameters of the decoder. The training goal of the second stage is to make the classification performance meet the preset requirement, for example, in the training process of the second stage, various loss functions for classification can be used for training, and the gradient is calculated, so as to optimize the decoder parameters. After the training is completed, it can be considered that E90 through E180 through E270 include the directivity information of three angles, and since these pieces of feature information are homologous (all from the input picture), it can be considered that the coded feature information Em includes the directivity information of each rotation angle. At this time, by combining E0 and Em as the input of the decoder, the input information can include the content information of the picture and the directional information at each rotation angle, so that when the picture processing is performed by using the whole trained neural network, the directional information in the picture can be effectively extracted, and classification can be completed by using the directional information, thereby having better classification performance.

Furthermore, the present application also provides a computer device, which includes a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the steps of the aforementioned neural network training method.

In particular, the methods and/or embodiments in the embodiments of the present application may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. The computer program, when executed by a processing unit, performs the above-described functions defined in the method of the present application.

It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart or block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As another aspect, the present application also provides a computer-readable medium, which may be included in the apparatus described in the foregoing embodiments; or may be separate and not incorporated into the device. The computer-readable medium carries one or more computer-readable instructions executable by a processor to perform the steps of the method and/or solution of the embodiments of the present application as described above.

In addition, the embodiment of the present application also provides a computer program, where the computer program is stored in a computer device, so that the computer device executes the steps of the neural network training method.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In some embodiments, the software programs of the present application may be executed by a processor to implement the above steps or functions. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A neural network training method, comprising:

2. The method of claim 1, wherein the encoder employs a convolutional neural network having a network depth less than a predetermined value, and the decoder comprises a fully-connected layer and a normalization layer.

3. The method of claim 1, wherein training an encoder of the second neural network to update parameters of the encoder according to second, third and fourth decoded feature information and second, third and fourth feature information such that the second, third and fourth decoded feature information approximate the second, third and fourth feature information, respectively, comprises:

4. The method of claim 1, wherein training the first neural network comprises:

5. The method of claim 4, wherein performing fractal processing on a training picture to obtain an input picture of the training picture including four rotation angles, the four rotation angles being 0 °, 90 °, 180 °, and 270 °, respectively, comprises:

6. The method according to claim 5, wherein combining the training pictures that are not rotated with the training pictures at the rotation angles of 90 °, 180 ° and 270 ° to obtain the input picture of the training pictures including four rotation angles of 0 °, 90 °, 180 ° and 270 °, respectively, comprises:

7. The method of claim 4, wherein the structure of the neural network comprises a backbone network, a full connectivity layer, and a normalization layer;

8. The method of claim 4, wherein calculating loss function values corresponding to different rotation angles from the output characteristic information and calculating gradients from the loss function values, and propagating back to update parameters of the neural network, comprises:

9. A computer device, characterized in that the device comprises a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, cause the device to perform the steps of the method of any of claims 1 to 8.

10. A computer-readable medium having computer-readable instructions stored thereon which are executable by a processor to implement the steps of the method of any one of claims 1 to 8.