WO2017166155A1

WO2017166155A1 - Method and device for training neural network model, and electronic device

Info

Publication number: WO2017166155A1
Application number: PCT/CN2016/077975
Authority: WO
Inventors: 陈理; 王淞; 范伟; 孙俊; 直井聪
Original assignee: 富士通株式会社; 陈理; 王淞; 范伟; 孙俊
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2017-10-05
Also published as: KR102161902B1; CN108140144B; CN108140144A; JP2019508803A; JP6601569B2; KR20180084969A

Abstract

A method and device for training a neural network model, and an electronic device. The method comprises: extracting a part of a neural network model to form a neural network sub-model; training the neural network sub-model to form an optimized neural network sub-model; initializing weights in the neural network model according to weights in the optimized neural network sub-model to form an initialized neural network model, the initialized neural network model and the optimized neural network sub-model having the same output features; and adjusting the weights in the initialized neural network model on the basis of a known training set. The method can shorten the large-scale neural network training time and avoid the over-fitting problem.

Description

Method, device and electronic device for training neural network model

Technical field

The present application relates to the field of information processing technologies, and in particular, to a method, device, and electronic device for training a neural network model.

Background technique

In recent years, the classification method based on Convolutional Neural Network (CNN) has achieved great success in the field of handwritten character recognition.

The CNN model is a hierarchical model. FIG. 1 is a schematic diagram of the CNN model. As shown in FIG. 1, the CNN model is composed of an input layer 101, a plurality of hidden layers 102, and an output layer 103. The input layer 101 provides data to be processed corresponding to the sample to be identified. When the sample to be identified is a grayscale image, the data to be processed is a two-dimensional matrix; the type of the hidden layer 102 may be a common convolution layer or a relaxed convolution. Layers, pooling layers, neuron layers, or fully connected layers, each of which provides a specific operation to process the data; output layer 103 provides the final result of the model, for the CNN model used for classification, The output layer 103 outputs the probability that the sample to be identified belongs to each class.

It should be noted that the above description of the technical background is only for the purpose of facilitating a clear and complete description of the technical solutions of the present invention, and is convenient for understanding by those skilled in the art. The above technical solutions are not considered to be well known to those skilled in the art simply because these aspects are set forth in the background section of the present invention.

Summary of the invention

Many published experimental evidences show that the larger the size of the CNN model, the more accurate the recognition result of the sample to be identified. However, the large-scale CNN model has the following problems during training: a) The larger the model, the easier it is to overfit. ; b) The larger the model, the longer the training time required.

The embodiment of the present application provides a method, a device, and an electronic device for training a neural network model, training a small-scale neural network model, and initializing a large-scale neural network model by a small-scale neural network model, and finally initializing The subsequent large-scale neural network model is fine-tuned, thereby avoiding problems such as over-fitting and excessive training time caused by direct training of large-scale neural networks.

According to a first aspect of embodiments of the present application, a method for training a neural network model is provided for Determining the weights in the neural network model, the method comprising:

Extracting a portion of the neural network model to form a neural network submodel;

Training the neural network sub-model to form an optimized neural network sub-model;

Initializing each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initializing the neural network model and the optimized neural network The model has the same output characteristics;

Each weight in the initialized neural network model is adjusted based on a known training set.

According to a second aspect of the embodiments of the present application, wherein extracting a portion of the neural network model comprises:

Converting the relaxed convolutional layer in the neural network model to a common convolutional layer to transform the neural network model into a common convolutional neural network model;

A portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.

According to a third aspect of the embodiments of the present application, wherein initializing each weight in the neural network model to form an initialization neural network model includes:

Initializing each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initial common convolutional neural network model;

The normal convolutional layer in the initialized general convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.

According to a fourth aspect of the embodiments of the present application, wherein initializing each weight in the neural network model to form an initialization neural network model includes:

Initializing, according to each weight in the optimized neural network sub-model, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;

Adjusting the weights in the initial general convolutional neural network model based on the known training set to form an adjusted ordinary convolutional neural network model;

The normal convolutional layer in the adjusted ordinary convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.

According to a fifth aspect of the embodiments of the present application, the initializing the common convolutional neural network model by initializing each weight in the common convolutional neural network model includes:

Generating the general convolutional neural network according to the weights of the hidden layers in the optimized neural network sub-model The weights of the corresponding hidden layers in the model are initialized to form an initial general convolutional neural network model, wherein the output characteristics of the implicit layers of the normalized convolutional neural network model are initialized with the optimized neural network The output characteristics of each hidden layer of the model are the same.

According to a sixth aspect of the embodiments of the present application, the weights of corresponding hidden layers in the common convolutional neural network model are performed according to weights of respective hidden layers in the optimized neural network sub-model Initialization includes:

The weights of the hidden layers in the optimized neural network sub-model are multiplied by predetermined coefficients as the weights of the corresponding hidden layers in the common convolutional neural network model.

According to a seventh aspect of the embodiments of the present application, there is provided an apparatus for training a neural network model for determining weights in a neural network model, the apparatus comprising:

An extracting unit for extracting a portion of the neural network model to form a neural network sub-model;

a first training unit for training the neural network sub-model to form an optimized neural network sub-model;

An initialization unit that initializes each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initialization neural network model and the optimization Neural network submodels have the same output characteristics;

A second training unit that adjusts weights in the initialized neural network model based on a known training set.

According to an eighth aspect of the embodiments of the present application, the extracting unit includes:

a first conversion unit for converting a relaxed convolutional layer in the neural network model into a common convolutional layer to convert the neural network model into a general convolutional neural network model;

An extraction subunit is configured to extract a partial neuron node in each hidden layer of the common convolutional neural network model to form the neural network submodel.

According to a ninth aspect of the embodiments of the present application, the initialization unit includes:

a first initialization subunit, configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model;

a second transformation unit for converting a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.

According to a tenth aspect of the embodiments of the present application, the initializing unit includes:

a second initialization subunit, initializing the Pu according to each weight in the optimized neural network submodel The weights in the convolutional neural network model are used to form an initial general convolutional neural network model;

a third training unit that adjusts weights in the initialized general convolutional neural network model based on a known training set to form an adjusted general convolutional neural network model;

a third transformation unit for converting a common convolutional layer in the adjusted general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.

According to an eleventh aspect of the embodiments of the present application, the first initializing subunit corresponds to the weight of each hidden layer in the optimized neural network submodel, to the common convolutional neural network model The weights of the hidden layers are initialized to form an initial general convolutional neural network model, wherein the output characteristics of the implicit layers of the normalized convolutional neural network model are initialized with each of the optimized neural network submodels The output characteristics of the hidden layer are the same.

According to a twelfth aspect of the embodiments of the present application, the first initializing subunit multiplies weights of the hidden layer in the optimized neural network submodel by a predetermined coefficient as the ordinary convolution The weights of the corresponding hidden layers in the neural network model.

According to a thirteenth aspect of the embodiments of the present application, there is provided an electronic device comprising the apparatus for training a neural network model according to any one of the seventh to twelfth aspects of the embodiments.

According to a fourteenth aspect of the embodiments of the present application, there is provided a computer readable program, wherein when the program is executed in a device or an electronic device that trains a neural network model, the program causes the pair of neural networks The apparatus or apparatus for training the model performs the method of training the neural network model described in any one of the first to sixth aspects of the above embodiments.

According to a fifteenth aspect of the embodiments of the present application, there is provided a storage medium storing a computer readable program, wherein the storage medium stores the computer readable program of the fourteenth aspect of the above embodiment, the computer readable program A device or an electronic device that trains a neural network model performs the method of training a neural network model as described in any one of the first to sixth aspects of the above embodiments.

The beneficial effects of the embodiments of the present application are: shortening the training time of large-scale neural networks and avoiding over-fitting problems.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and the drawings, in which <RTIgt; It should be understood that the embodiments of the invention are not limited in scope. The embodiments of the present invention include many variations, modifications, and equivalents within the scope of the appended claims.

Features described and/or illustrated with respect to one embodiment may be in one or more in the same or similar manner Used in other embodiments, in combination with, or in lieu of, features in other embodiments.

It should be emphasized that the term "comprising" or "comprises" or "comprising" or "comprising" or "comprising" or "comprising" or "comprises"

DRAWINGS

The elements and features described in one of the figures or one embodiment of the embodiments of the invention may be combined with the elements and features illustrated in one or more other figures or embodiments. In the accompanying drawings, like reference numerals refer to the

The accompanying drawings are included to provide a further understanding of the embodiments of the invention Obviously, the drawings in the following description are only some of the embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without any inventive labor. In the drawing:

Figure 1 is a schematic diagram of a CNN model;

2 is a schematic diagram of a neural network model of Embodiment 1;

3 is a schematic diagram of a method of training a neural network model of Embodiment 1;

4 is a schematic diagram of a method of extracting a portion of a neural network model of Embodiment 1;

5 is a schematic diagram of a conventional convolutional neural network model of Embodiment 1;

Figure 6 is a schematic view showing a treatment mode of the slack convolution layer of the first embodiment;

7 is a schematic diagram showing a processing manner of a general convolution layer of Embodiment 1;

8 is a schematic diagram of a neural network sub-model of Embodiment 1;

9 is a schematic diagram of a method for initializing weights in a neural network model in Embodiment 1;

Figure 10 (A) is a schematic diagram of an input layer and a convolution layer of the optimized neural network sub-model;

Figure 10 (B) is a schematic diagram of an input layer and a convolution layer of an ordinary convolutional neural network model after initialization;

Figure 11 (A) is a schematic diagram of a pooled layer and a convolutional layer of the optimized neural network sub-model;

Figure 11 (B) is a schematic diagram of a pooling layer and a convolution layer of an ordinary convolutional neural network model after initialization;

Figure 12 (A) is a partial schematic view of the pooling layer and the convolution layer of Figure 11 (A);

Figure 12 (B) is a partial schematic view of the pooling layer and the convolution layer of Figure 11 (B);

Figure 13 (A) is a schematic diagram of the fully connected layer of the optimized neural network sub-model and its previous hidden layer;

Figure 13 (B) is a schematic diagram of the fully connected layer of the normal convolutional neural network model after initialization and its previous hidden layer;

Figure 14 (A) is a partial schematic view of the fully connected layer of Figure 13 (A) and its previous hidden layer;

Figure 14 (B) is a partial schematic view of the fully connected layer of Figure 13 (B) and its previous hidden layer;

15 is another schematic diagram of a method for initializing weights in a neural network model in Embodiment 1;

16 is a schematic diagram of an apparatus for training a neural network model of Embodiment 2;

Figure 17 is a schematic illustration of the extraction unit of Embodiment 2;

Figure 18 is a schematic diagram of an initialization unit of Embodiment 2;

Figure 19 is another schematic diagram of the initialization unit of Embodiment 2;

20 is a schematic block diagram showing the system configuration of the electronic device 2000 of the third embodiment.

detailed description

The foregoing and other features of the present invention will be apparent from the The specific embodiments of the present invention are disclosed in the specification and the drawings, which are illustrated in the embodiment of the invention The invention includes all modifications, variations and equivalents falling within the scope of the appended claims. Various embodiments of the present invention will be described below with reference to the accompanying drawings. These embodiments are merely exemplary and are not limiting of the invention.

Example 1

Embodiment 1 of the present application provides a method of training a neural network model for determining weights in a neural network model.

2 is a schematic diagram of a neural network model of the present embodiment. As shown in FIG. 2, the neural network model 200 includes an input layer 201, a convolution layer 202, a pooling layer 203, a relaxed convolution layer 204, and a fully connected layer 205. And the output layer 206, wherein the convolution layer 202, the pooling layer 203, the relaxed convolution layer 204, and the fully connected layer 205 are all hidden layers.

In this embodiment, the input layer 201 can input the data to be identified 2011; each of the convolutional layer 202, the pooling layer 203, the relaxed convolutional layer 204, the fully connected layer 205, and the output layer 206 receives the upper layer. The output data is processed by the weight corresponding to the layer to generate the data output by the layer and output from the neuron node (neuron) of the layer, and the neuron nodes of each layer are respectively 2021-2024, 2031-2034, 2041-2046, 2051-2058, and 2061-20610, and, based on the data output by the neuron node of the output layer 206, the probability that the data to be identified 2011 belongs to each category 206a can be determined; further, in FIG. 2, only the

marker Neuron nodes

2021, 2024, 2031, 2034, 2041, 2046, 2051, 2058, 2061, and 20610 are excluded, and other neuron nodes are not marked.

As shown in FIG. 2, when the neural network model 200 is used to identify the handwritten digital image, the data to be identified 2011 input by the input layer 201 may be a handwritten digital image, a convolution layer 202, a pooling layer 203, and a slack convolution layer. The data output by the neuron node of 204 may be a feature map, and the data output by the neuron nodes of the fully connected layer 205 and the output layer 206 may be numerical values. A number in the numbers 0-9 may correspond to a category 206a. Therefore, based on the data output by the output layer 206, the probability that the data to be identified 2011 belongs to each of 0-9 can be determined.

In the neural network model 200, the weights corresponding to each layer are selected to ensure that the classification result output by the output layer 206 is accurate, wherein the weight corresponding to each layer may be a matrix of m*n, m And n are both natural numbers.

The method for training the neural network model in this embodiment is for determining the weight corresponding to each layer in the neural network model.

In the following description of the present embodiment, a method of training the neural network model of the present embodiment will be described by taking the neural network model 200 shown in FIG. 2 as an example.

It should be noted that, in this embodiment, the neural network model 200 has a relaxed convolution layer 204. Therefore, the neural network model 200 belongs to a convolutional neural network model. Of course, the neural network model 200 of the present embodiment may not have The relaxation convolution layer 204 is not limited in this embodiment; and the method for training the neural network model described in this embodiment is applicable not only to the convolutional neural network model but also to other neural network models.

FIG. 3 is a schematic diagram of a method for training a neural network model according to the embodiment. As shown in FIG. 3, the method includes:

S301. Extract a part of a neural network model to form a neural network sub-model;

S302. Train the neural network submodel to form an optimized neural network submodel.

S303. Initialize each weight in the neural network model according to each weight in the optimized neural network submodel to form an initialization neural network model, and initialize the neural network model and the optimized neural network. The network submodel has the same output characteristics;

S304. Adjust, according to the known training set, the weights in the initialized neural network model.

According to an embodiment of the present application, by training a small-scale neural network sub-model, the trained sub-model is used to initialize a large-scale neural network model, and then fine-tuning a large-scale neural network model. Thus, the method of the present embodiment can avoid problems such as overfitting and excessive training time as compared with a method of directly training a large-scale neural network model.

4 is a schematic diagram of a method of extracting a part of a neural network model of the embodiment, as shown in FIG. 4, the method includes:

S401. Convert a relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model;

S402. Extract a partial neuron node in each hidden layer of the common convolutional neural network model to form the neural network sub-model.

In step S401, the relaxed convolutional layer 204 in the neural network model 200 of FIG. 2 is converted into a normal convolutional layer, thereby transforming the neural network model 200 from a convolutional neural network model to a general convolutional neural network model.

5 is a schematic diagram of the conventional convolutional neural network model 500 in which the relaxed convolutional layer 204 is transformed into a common convolutional layer 504, and the other layers of the conventional convolutional neural network model 500 are identical to the neural network model 200.

The data processing manner of the ordinary convolutional layer 504 is different from the data processing manner of the slack convolutional layer 204 in that, in the ordinary convolutional layer 504, data sharing at different locations within the same neuron node participating in the convolution operation is performed. A weight, and in the relaxed convolutional layer 204, data at different locations within the same neuron node participating in the convolution operation does not share any of the weights.

6 is a schematic diagram of the processing manner of the slack convolution layer 204 of the present embodiment. As shown in FIG. 6, P1 and P2 are different neuron nodes participating in the convolution operation, and P11 and P14 are different positions in P1. Data, P21, P24 are data of different positions in P2, W11, W14, W21, W24 are different weights, and T11 and T14 are data generated after convolution operation, wherein T11 and T14 are calculated as follows ( 1), (2):

In Figure 6 and equations (1) and (2), only the calculation methods of T11 and T14 are shown. The calculation methods of T12 and T13 are similar, and this embodiment will not be described in detail.

As shown in Fig. 6 and equations (1) and (2), the data P11 and P14 in the neuron node P1 correspond to independent weights W11 and W14, respectively, and the data P21 and P24 in the neuron node P2 respectively correspond to independent rights. Value W21, W24, that is, data at different locations within the same neuron node does not share any weight.

7 is a schematic diagram of the processing manner of the conventional convolution layer 504 of the present embodiment, and the meanings of P1, P2, P11, P14, P21, P24, T11, and T14 are the same as those of FIG. 6, and W1 and W2 are different weights. Among them, T11 and T14 are calculated as shown in the following equations (3) and (4):

As shown in FIG. 7 and equations (3) and (4), the data P11 and P14 at different positions in the neuron node P1 share the weight W1, and the data P21 and P24 at different positions in the neuron node P2 share the weight W2. .

In step S401 of the present embodiment, part of the weights in the relaxed convolution layer 204 of the neural network model 200 may be deleted to reduce the number of weights, thereby causing the same neuron node to participate in the convolution operation The data shares a weight, thereby transforming the relaxed convolutional layer 204 into a normal convolutional layer 504 to convert the neural network model 200 into a normal convolutional neural network model 500.

In step S402 of the present embodiment, the neuron nodes of each hidden layer of the ordinary convolutional neural network model 500 can be deleted according to a certain ratio, thereby obtaining a neural network sub-model, in which each hidden layer is deleted. The proportion of neuron nodes can be the same or different.

FIG. 8 is a schematic diagram of the neural network sub-model 800 of the present embodiment. As shown in FIG. 8, on the basis of the general convolutional neural network model 500 of FIG. 5, the proportion of neurons in each hidden layer is deleted. 50%, thereby forming a neural network sub-model 800, wherein 801 of FIG. 8 is a neuron node deleted from the ordinary convolutional neural network model 500, and the input layer 201 and the output layer 206 of FIG. 8 are respectively associated with FIG. The input layer 201 is the same as the output layer 206, and the convolutional layer 802, the pooling layer 803, the convolutional layer 804, and the fully connected layer 805 of FIG. 8 and the convolutional layer 202, the pooling layer 203, and the ordinary convolutional layer 504 of FIG. And the fully connected layers 205 correspond to each.

In step S401 and step S402 of the embodiment, the neural network model 200 is first converted into a common convolutional neural network model, and then the neuron node is deleted from the common convolutional neural network model to obtain the neural network sub-model 800. Among them, the purpose of transforming the neural network model 200 into a common convolutional neural network model is to reduce the number of weights in the subsequently generated neural network sub-model 800 and avoid over-fitting. However, the present embodiment is not limited thereto. If the neural network model 200 does not have the slack convolution layer 204, the neuron node 200 can be directly deleted by the neural network model 200.

In step S302 of the embodiment, the neural network sub-model 800 can be trained according to a known training set to determine an optimized value of each weight thereof, thereby training the neural network sub-model 800 as an optimized god. Through the network sub-model. The method for training the neural network sub-model 800 can refer to the prior art, and details are not described in this embodiment.

In step S303 of the present embodiment, the optimized neural network submodel can be used to initialize the weights in the neural network model 200, and the initialized neural network model 200 has the same neural network submodel as the optimized neural network. The output characteristics, whereby the initialization process does not introduce new errors.

FIG. 9 is a schematic diagram of a method for initializing each weight in the neural network model according to the embodiment, for implementing step S303. As shown in FIG. 9, the method includes:

S901: Initialize, according to each weight in the optimized neural network submodel, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;

S902. Convert a common convolution layer in the initialized general convolutional neural network model into a relaxation convolution layer to form the initialization neural network model.

In step S901 of the embodiment, the weights of the corresponding hidden layers in the common convolutional neural network model 500 may be initialized according to the weights of the hidden layers in the optimized neural network submodel. Forming an initial general convolutional neural network model, wherein an output characteristic of each hidden layer of the initialized general convolutional neural network model is the same as an output characteristic of each hidden layer of the optimized neural network submodel, for example, The weights of the hidden layers in the optimized neural network submodel are multiplied by a predetermined coefficient as the weights of the corresponding hidden layers in the general convolutional neural network model.

Next, the normal convolutional neural network model 500 of FIG. 5 will be taken as an example to describe the process of initializing the weights.

1. Initializing the weights of the convolution layer 202 as the first hidden layer

In FIG. 5, convolutional layer 202 is coupled to input layer 201, which is the first hidden layer after input layer 201. The input data of the convolutional layer 202 is the data to be identified of the input layer 201, and the data to be identified is convoluted with the weight of the convolutional layer 202 to obtain the output data of each neuron node of the convolutional layer 202.

10(A) is a schematic diagram of the input layer 201 and the convolution layer 802 of the optimized neural network sub-model 800, and FIG. 10(B) is the input layer 201 and the convolution layer of the initialized normal convolutional neural network model 500. A schematic diagram of 202.

As shown in FIG. 10(A), in the neural network submodel 800, the input data is convoluted with the weight K1 to obtain the feature map A1 output by the neuron node 8021, and the input data is convoluted with the weight K2 to obtain the feature map A2. As shown in FIG. 10(B), the weight K1 is multiplied by a predetermined coefficient L11 as a common convolutional neural network submodel 500 volume. The weights corresponding to the

neuron nodes

2021, 2023 of the layer 202 multiply the weight K2 by a predetermined coefficient L12, which corresponds to the

neuron nodes

2022, 2024 of the convolutional layer 202 of the common convolutional neural network submodel 500. The weights are thus initialized for each weight of the convolutional layer 202 of the normal convolutional neural network model 500.

In this embodiment, the predetermined coefficients L11 and L12 may both be 1, and therefore, the feature maps output by the

neuron nodes

2021, 2022, 2023, and 2024 of the convolutional layer 202 of the common convolutional neural network model 500 are respectively A1. A2, A3, A4, where A1 = A3, A2 = A4, whereby the convolutional layer 202 of the normalized convolutional neural network model 500 after initialization has the same convolutional layer 802 as the optimized neural network submodel 800 Output characteristics.

Of course, in the present embodiment, L11, L12 may have other values, and may be different from each other.

2. Initializing the weights of the convolutional layer 504 as a non-first hidden layer

In FIG. 5, the feature map output by each neuron node of the convolution layer 202 is used as input data of the pooling layer 203, and the feature map output by the pooling layer 203 is used as input data of the convolution layer 504, and the convolution layer 504 is a non-first hidden layer.

11(A) is a schematic diagram of the pooled layer 803 and the convolution layer 804 of the optimized neural network submodel 800, and FIG. 11(B) is the pooled layer 203 and volume of the initialized normal convolutional neural network model 500. A schematic of the buildup 504.

As shown in FIG. 11(A), the feature maps B1, B2 outputted by the respective neuron nodes of the pooling layer 803 are used to generate feature maps C1-C3 of the respective neuron nodes of the convolutional layer 804. In the pooling layer 803, A1 and A2 of FIG. 10(A) and the corresponding weights are respectively pooled to obtain B1 and B2.

As shown in Fig. 11(B), the feature maps B1-B4 outputted by the respective neuron nodes of the pooling layer 203 are used to generate feature maps C1'-C6' of the respective neuron nodes of the convolutional layer 504. In the pooling layer 203, A1-A4 of FIG. 10(B) and the corresponding weights are respectively pooled to obtain B1-B4, and each weight in the pooling layer 203 may be pooled. The weights in the layer 803 are multiplied by a predetermined coefficient, which may be, for example, 1, therefore, in Fig. 11(B), B1 = B3, B2 = B4, whereby the normal volume after initialization The pooling layer 203 of the product neural network model 500 has the same output characteristics as the pooling layer 803 of the optimized neural network submodel 800.

12(A) is a partial schematic view of the pooling layer 803 and the convolution layer 804 of FIG. 11(A), and FIG. 12(B) is a partial schematic view of the pooling layer 203 and the convolution layer 504 of FIG. 11(B). In Figures 12(A) and 12(B), the corresponding weights are shown.

As shown in FIG. 12(A), in the neural network submodel 800, B1, B2 and weights K3, K4 are rolled. Product, the feature map C1 output by the neuron node 8041 is obtained, and the convolution is as shown in the following equation (5):

As shown in FIG. 12(B), the weight K3 is multiplied by a predetermined coefficient L21 to obtain K3' as the weight corresponding to the feature maps B1, B3 in the convolutional layer 504 of the ordinary convolutional neural network submodel 500. Multiplying the weight K4 by a predetermined coefficient L22 to obtain K4' as the weight corresponding to the feature maps B2 and B4 in the convolutional layer 504 of the ordinary convolutional neural network submodel 500, thereby the common convolutional neural network The weights of the convolutional layer 504 of the model 500 are initialized.

In the present embodiment, the predetermined coefficients L21, L22 may both be 1/2, and therefore, K3' = K3 * 1/2, K4' = K4 * 1/2, whereby the features output by the neuron node 5041 Figure C1' can be expressed as shown in the following equation (6):

It can be seen that C1' = C1, whereby the convolutional layer 504 of the normalized convolutional neural network model 500 after initialization has the same output characteristics as the convolutional layer 804 of the optimized neural network submodel 800.

Of course, in the present embodiment, L21, L22 may have other values, and may be different from each other.

In the present embodiment, the weight between B1, B4, and C2' of FIG. 11(B) can be used in a similar manner to the above, using the weights between B1, B2, and C2 of FIG. 11(A). Initialization is performed to initialize the weight between B1-B4 and C3' of FIG. 11(B) using the weights between B1, B2, and C3 of FIG. 11(A), whereby C2'=C2, C3'=C3.

In the present embodiment, the weight between B1, B4 and C4' of Fig. 11(B) can be used in a similar manner to the above, using the weights between B1, B2 and C1 of Fig. 11(A). Initialization, as shown in FIG. 12(B), K3', K4' can be generated according to K3, K4, so that the feature map C4' output by the neuron node 5044 can be generated from B1-B4 and K3', K4', And C1'=C4'. Similarly, the weight between B1, B4, and C5' of FIG. 11(B) can be initialized using the weights between B1, B2, and C2 of FIG. 11(A), using FIG. 11(A). The weight between B1, B2 and C3 is initialized to the weight between B1-B4 and C6' of Fig. 11(B), whereby C2'=C5', C3'=C6'.

In the present embodiment, if there are other convolution layers after the convolutional layer 504, the initialization methods of the weights in the other convolutional layers are similar to the initialization methods for the weights in the convolutional layer 504.

3. Initialize the weights of the fully connected layer 205.

In FIG. 5, the fully connected layer 205 may be located after all the convolutional layers, and the output layer 206 may be connected to the rear of the fully connected layer 205, that is, the fully connected layer 205 is the last hidden layer, or the fully connected layer. Other fully connected layers may be connected to the rear of 205, i.e., the fully connected layer 205 is a non-final hidden layer.

When the fully-connected layer 205 is a non-last hidden layer, the initialization method of each weight of the fully-connected layer 205 may refer to the initialization method of the convolution layer 504 located after the first hidden layer, and at the fully-connected layer. In 205, the convolution operation is replaced by a multiplication operation because the fully-connected layer 205 can be regarded as a convolutional layer having a weight of 1 × 1.

Next, a method of initializing the weights of the all-connection layer 205 when the fully-connected layer 205 is the last hidden layer will be described.

The fully connected layer 205, which is the last hidden layer, has as many neuron nodes as the number of classes of the output layer 206. Therefore, for the fully connected layer 805 of the optimized neural network submodel 800 and the fully connected layer 205 of the initialized normal convolutional neural network model 500, the number of neuron nodes is the same, but the input data of the two The number can vary.

13(A) is a schematic diagram of the fully connected layer 805 of the optimized neural network submodel 800 and its previous hidden layer, and FIG. 13(B) is the fully connected layer 205 of the initialized normal convolutional neural network model 500. And a schematic diagram of the previous hidden layer.

As shown in FIG. 13(A), the data F1, F2 outputted by the neuron nodes of the previous hidden layer are used to generate the output data E1-E3 of the respective neuron nodes of the fully connected layer 805. As shown in Fig. 13(B), the data F1-F4 outputted by the respective neuron nodes of the previous hidden layer are used to generate the output data E1'-E3' of the respective neuron nodes of the fully connected layer 205. Among them, the data F1-F4 may be in the form of a floating point number.

In FIG. 13(A), the number of neuron nodes of the previous hidden layer is reduced by half compared with the number of the previous hidden layer of FIG. 13(B) due to the operation of the previous extraction submodel. The number of data outputted by the previous hidden layer in Fig. 13(A) is also half the number of data outputted by the previous hidden layer in Fig. 13(B). In Fig. 13(B), the previous hidden layer can satisfy, for example, the following conditions after initialization, F1 = F3, F2 = F4.

Figure 14 (A) is a partial schematic view of the fully-connected layer 805 of Figure 13 (A) and its previous hidden layer, Figure 14 (B) is the fully-connected layer 205 of Figure 13 (B) and its previous hidden layer A portion of the schematic diagram, in Figures 14(A) and 14(B), shows the corresponding weights.

As shown in FIG. 14(A), in the neural network submodel 800, F1 and F2 are multiplied by weights K5 and K6 to obtain data E1 outputted by the neuron node 8051, and the multiplication is as shown in the following equation (7):

E1=F1*K5+F2*K6 (7)

As shown in FIG. 14(B), the weight K5 is multiplied by a predetermined coefficient L31 to obtain K5', which is the weight corresponding to F1 and F3 in the fully connected layer 205 of the ordinary convolutional neural network model 500. The value K6 is multiplied by a predetermined coefficient L32 to obtain K5' as the weight corresponding to F2 and F4 in the fully connected layer 205 of the ordinary convolutional neural network model 500, thereby the fully connected layer of the ordinary convolutional neural network model 500. The weights of 205 are initialized.

In the present embodiment, the predetermined coefficients L31 and L32 may both be 1/2, and therefore, K5'=K5*1/2, K6'=K6*1/2, whereby the data output by the neuron node 2051 E1' can be expressed as shown in the following equation (8):

It can be seen that E1' = E1, whereby the fully connected layer 205 of the initialized normal convolutional neural network model 500 has the same output characteristics as the fully connected layer 805 of the optimized neural network submodel 800.

Of course, in the present embodiment, L31 and L32 may have other values, and may be different from each other.

In the present embodiment, the weight between F1, F4 and E2' of Fig. 13(B) can be used in a similar manner to the above, using the weights between F1, F2 and E2 of Fig. 13(A). Initialization is performed to initialize the weight between F1-F4 and E3' of FIG. 13(B) using the weights between F1, F2, and E3 of FIG. 13(A), whereby E2'=E2, E3'=E3.

According to the operations of 1-3 described above, the weights in the ordinary convolutional neural network model 500 shown in FIG. 5 can be initialized. Of course, the general convolutional neural network model 500 is not limited to the result shown in FIG. 5. The ordinary convolutional neural network model 500 may have other hidden layers. For the initialization method of the other hidden layers, reference may be made to the above. 1-3 operation.

In step S902 of the embodiment, the process of converting the normal convolution layer into the slack convolution layer may be the reverse process of step S401, that is, in step S902, the weights in the ordinary convolution layer may be copied in multiple copies. The data of different positions in the same neuron node participating in the convolution operation are corresponding to different weights. For example, the weight W1 in FIG. 7 is copied into W11 and W14, and the weight W2 is copied into W21 and W24. Thereby, the ordinary convolution layer is converted into a relaxed convolution layer.

FIG. 15 is another schematic diagram of a method for initializing each weight in the neural network model in the embodiment, for implementing step S303. As shown in Figure 15, the method includes:

S1501: Adjusting, according to a known training set, each weight in an initial general convolutional neural network model to form an adjusted ordinary convolutional neural network model;

S1502: Convert a common convolution layer in the adjusted ordinary convolutional neural network model into a relaxation convolution layer to form the initialization neural network model.

The method of FIG. 15 is different from the method of FIG. 9 in that step S1501 is added to the method of FIG. 15, that is, after the normal convolutional neural network model is initialized, the ordinary convolutional neural network model is adjusted by Thereby, the amount of work when the adjustment is performed in step S304 can be alleviated. For the method for performing the adjustment in step S1501, reference may be made to the method for adjusting the neural network model in the prior art, which is not described in this embodiment.

In the present embodiment, the processing method of converting the ordinary convolution layer into the slack convolution layer in step S1502 may be the same as the processing method of step S902.

In the method shown in FIG. 9 and FIG. 15, the weights in the normal convolutional neural network model are initialized in step S901, and the ordinary convolution layer is converted into a slack volume by step S902 or steps S1501 and S1502. The layers are layered to initialize the weights of the neural network model 200; however, the embodiment is not limited thereto, and if the neural network model 200 does not have a relaxed convolutional layer, the neural network model 200 may be directly used in step S901. The weights are initialized without step S902 or steps S1501 and S1502.

In the step S304 of the embodiment, the weights in the initialization neural network model may be adjusted based on the known training set, and the manner of the adjustment may be referred to the prior art, which is not described in detail in this embodiment.

According to the present embodiment, a small-scale neural network model is trained, and a large-scale neural network model is initialized by a small-scale neural network model, and finally a large-scale neural network model after initialization is fine-tuned due to a small-scale network. Most of the training work has been completed, so large-scale networks only need to be fine-tuned for several rounds to converge, thereby avoiding over-fitting and excessive training time for direct training of large-scale neural networks. problem.

Example 2

Embodiment 2 provides an apparatus for training a neural network model, which corresponds to the method of Embodiment 1.

16 is a schematic diagram of an apparatus for training a neural network model according to the second embodiment. As shown in FIG. 16, the apparatus 1600 includes: an extracting unit 1601, a first training unit 1602, and an initializing unit 1603. And a second training unit 1604.

The extracting unit 1601 is configured to extract a part of the neural network model to form a neural network sub-model; the first training unit 1602 is configured to train the neural network sub-model to form an optimized neural network sub-model; and the initializing unit 1603 Initializing each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initializing the neural network model and the optimized neural network The models have the same output characteristics; the second training unit 1604 adjusts the weights in the initialized neural network model based on the known training set.

17 is a schematic diagram of the extracting unit 1601 of the second embodiment. As shown in FIG. 17, the extracting unit 1601 includes a first converting unit 1701 and an extracting subunit 1702.

The first conversion unit 1701 is configured to convert the relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model; and the extraction subunit 1702 is used to A portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.

FIG. 18 is a schematic diagram of the initialization unit 1603 of the second embodiment. As shown in FIG. 18, the initialization unit 1603 includes a first initialization subunit 1801 and a second conversion unit 1802.

The first initialization subunit 1801 is configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model; The second transforming unit 1802 is configured to convert the normal convolutional layer in the initialized normal convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

In this embodiment, the first initialization subunit 1801 may determine the weight of the corresponding hidden layer in the common convolutional neural network model according to the weight of each hidden layer in the optimized neural network submodel. Initializing to form an initialization common convolutional neural network model, wherein the output characteristics of the implicit layers of the initialized general convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network submodel For example, the first initialization subunit 1801 may multiply each weight of the hidden layer in the optimized neural network submodel by a predetermined coefficient as a corresponding hidden layer in the common convolutional neural network model. The weights of each.

In the present embodiment, the second converting unit 1802 may convert the ordinary convolutional neural network model into a relaxed convolutional neural network model by using an operation opposite to that of the first converting unit 1701.

FIG. 19 is another schematic diagram of the initializing unit 1603 of the second embodiment. As shown in FIG. 19, the initializing unit 1603 includes: a second initializing subunit 1901, a third training unit 1902, and a third converting unit. 1903.

The second initialization subunit 1901 initializes each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initial common convolutional neural network model; The training unit 1902 adjusts each weight in the initialized normal convolutional neural network model to form an adjusted ordinary convolutional neural network model based on the known training set; the third transformation unit 1903 is configured to use the adjusted normal volume The ordinary convolutional layer in the neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.

In this embodiment, the processing manner of the second initialization subunit 1901 may be the same as the processing manner of the first initialization subunit 1801; and the manner in which the third training unit 1902 adjusts the weights in the normal convolutional neural network model. Reference may be made to the prior art; the third conversion unit 1903 may refer to the second conversion unit 1802 by converting the ordinary convolution layer into a relaxed convolution layer.

Example 3

Embodiment 3 of the present application provides an electronic device including the device for training a neural network model as described in Embodiment 2.

FIG. 20 is a schematic block diagram showing the system configuration of an electronic device 2000 according to an embodiment of the present invention. As shown in FIG. 20, the electronic device 2000 can include a central processor 2100 and a memory 2140; the memory 2140 is coupled to the central processor 2100. It should be noted that the figure is exemplary; other types of structures may be used in addition to or in place of the structure to implement telecommunications functions or other functions.

In one embodiment, the functionality of the device that trains the neural network model can be integrated into the central processor 2100. The central processing unit 2100 can be configured to:

Extracting a portion of the neural network model to form a neural network sub-model; training the neural network sub-model to form an optimized neural network sub-model; and initializing the weight according to each weight in the optimized neural network sub-model Deriving weights in a neural network model to form an initialized neural network model, and the initial The neural network model has the same output characteristics as the optimized neural network sub-model; and the weights in the initialized neural network model are adjusted based on the known training set.

The central processing unit 2100 may be further configured to: convert the relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model; extract the A part of the neuron nodes in each hidden layer of the general convolutional neural network model to form the neural network sub-model.

The central processing unit 2100 may be further configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized common convolutional neural network. a model; transforming a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

The central processing unit 2100 may be further configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized common convolutional neural network. Model; based on the known training set, adjusting the weights in the initial general convolutional neural network model to form an adjusted ordinary convolutional neural network model; and the ordinary volume in the adjusted ordinary convolutional neural network model The laminate is converted into a relaxed convolutional layer to form the initialized neural network model.

The central processing unit 2100 may be further configured to: perform weights of corresponding hidden layers in the common convolutional neural network model according to weights of respective hidden layers in the optimized neural network sub-model Initializing to form an initialization common convolutional neural network model, wherein the output characteristics of the implicit layers of the initialized normal convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network submodel.

The central processing unit 2100 may be further configured to: multiply each weight of the hidden layer in the optimized neural network sub-model by a predetermined coefficient as a corresponding implicit in the common convolutional neural network model Contains the weights of the layers.

In another embodiment, the apparatus for training the neural network model may be configured separately from the central processing unit 2100. For example, the apparatus for training the neural network model may be configured as a chip connected to the central processing unit 2100 through the central processing unit. Control to implement the functionality of the device that trains the neural network model.

As shown in FIG. 20, the electronic device 2000 may further include: a communication module 2110, an input unit 2120, an audio processing unit 2130, a display 2160, and a power source 2170. It should be noted that the electronic device 2000 does not have to include all the components shown in FIG. 20; in addition, the electronic device 2000 may further include components not shown in FIG. 20, and reference may be made to the prior art.

As shown in FIG. 20, central processor 2100, also sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device that receives input and controls each of electronic devices 2000. The operation of the part.

The memory 2140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable device, and may store a program for executing related information. . And the central processing unit 2100 can execute the program stored by the memory 2140 to implement information storage or processing and the like. The functions of other components are similar to those of the existing ones and will not be described here. The various components of electronic device 2000 may be implemented by special purpose hardware, firmware, software, or a combination thereof without departing from the scope of the invention.

The embodiment of the present application further provides a computer readable program, wherein the program causes the information processing device or the electronic device to perform the pair of nerves described in Embodiment 1 when the program is executed in an information processing device or an electronic device The method of training the network model.

The embodiment of the present application further provides a storage medium storing a computer readable program, wherein the storage medium stores the computer readable program, wherein the computer readable program causes the information processing device or the electronic device to perform the embodiment 1 A method of training a neural network model.

The apparatus for training a neural network model described in connection with an embodiment of the present invention may be directly embodied as hardware, a software module executed by a processor, or a combination of both. For example, one or more of the functional block diagrams shown in Figures 16-19 and/or one or more combinations of functional block diagrams (e.g., ... units, ... units, ... units, etc.) may correspond to a computer program flow Each software module can also correspond to each hardware module. These software modules may correspond to the respective steps shown in Embodiment 1, respectively. These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).

The software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. A storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor. The processor and the storage medium can be located in an ASIC. The software module can be stored in the memory of the mobile terminal or in a memory card that can be inserted into the mobile terminal. For example, if a device (such as a mobile terminal) uses a larger capacity MEGA-SIM card or a large-capacity flash memory device, the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.

One or more of the functional block diagrams described with respect to Figures 16-19 and/or one or more combinations of functional block diagrams (e.g., ... units, ... units, ... units, etc.) may be implemented to perform the functions described herein. General purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or any suitable thereof combination. One or more of the functional blocks described with respect to Figures 16-19 and/or one or more combinations of functional blocks may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors One or more microprocessors in conjunction with DSP communication or any other such configuration.

The present invention has been described in connection with the specific embodiments thereof, but it is to be understood that the description is intended to be illustrative and not restrictive. Various modifications and alterations of this application will be apparent to those skilled in the art in the light of the invention.

Claims

A method of training a neural network model for determining weights in a neural network model, the method comprising:

Extracting a portion of the neural network model to form a neural network submodel;

Training the neural network sub-model to form an optimized neural network sub-model;

Initializing each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initializing the neural network model and the optimized neural network The model has the same output characteristics;

Each weight in the initialized neural network model is adjusted based on a known training set.
The method of training a neural network model according to claim 1, wherein extracting a part of the neural network model comprises:

Converting the relaxed convolutional layer in the neural network model to a common convolutional layer to transform the neural network model into a common convolutional neural network model;

A portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.
The method of training a neural network model according to claim 2, wherein initializing each weight in the neural network model to form an initialized neural network model comprises:

Initializing each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initial common convolutional neural network model;

The normal convolutional layer in the initialized general convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
The method of training a neural network model according to claim 2, wherein initializing each weight in the neural network model to form an initialized neural network model comprises:

Initializing, according to each weight in the optimized neural network sub-model, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;

Adjusting the weights in the initial general convolutional neural network model based on the known training set to form an adjusted ordinary convolutional neural network model;

Converting a common convolution layer in the adjusted ordinary convolutional neural network model into a relaxed convolution layer to form The initialization neural network model.
The method of training a neural network model according to claim 3, wherein initializing each weight in the common convolutional neural network model to form an initialized general convolutional neural network model comprises:

And initializing a weight of a corresponding hidden layer in the common convolutional neural network model according to weights of each hidden layer in the optimized neural network sub-model to form an initial common convolutional neural network model, The output characteristics of the hidden layers of the initialized general convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network sub-model.
The method for training a neural network model according to claim 5, wherein a corresponding implicit value in said common convolutional neural network model is obtained according to weights of respective hidden layers in said optimized neural network submodel Initialization of the weights of the containing layer includes:

The weights of the hidden layers in the optimized neural network sub-model are multiplied by predetermined coefficients as the weights of the corresponding hidden layers in the common convolutional neural network model.
A device for training a neural network model for determining weights in a neural network model, the device comprising:

An extracting unit for extracting a portion of the neural network model to form a neural network sub-model;

a first training unit for training the neural network sub-model to form an optimized neural network sub-model;

An initialization unit that initializes each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initialization neural network model and the optimization Neural network submodels have the same output characteristics;

A second training unit that adjusts weights in the initialized neural network model based on a known training set.
The apparatus for training a neural network model according to claim 7, wherein the extracting unit comprises:

a first conversion unit for converting a relaxed convolutional layer in the neural network model into a common convolutional layer to convert the neural network model into a general convolutional neural network model;

An extraction subunit is configured to extract a partial neuron node in each hidden layer of the common convolutional neural network model to form the neural network submodel.
The apparatus for training a neural network model according to claim 8, wherein said initialization sheet The yuan includes:

a first initialization subunit, configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model;

a second transformation unit for converting a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
The apparatus for training a neural network model according to claim 8, wherein the initialization unit comprises:

a second initialization subunit, initializing each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initial common convolutional neural network model;

a third training unit that adjusts weights in the initialized general convolutional neural network model based on a known training set to form an adjusted general convolutional neural network model;

a third transformation unit for converting a common convolutional layer in the adjusted general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
The apparatus for training a neural network model according to claim 9, wherein said first initializing subunit

And initializing a weight of a corresponding hidden layer in the common convolutional neural network model according to weights of each hidden layer in the optimized neural network sub-model to form an initial common convolutional neural network model, The output characteristics of the hidden layers of the initialized general convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network sub-model.
The apparatus for training a neural network model according to claim 11, wherein said first initialization subunit

The weights of the hidden layers in the optimized neural network sub-model are multiplied by predetermined coefficients as the weights of the corresponding hidden layers in the common convolutional neural network model.
An electronic device comprising the apparatus for training a neural network model as claimed in any one of claims 7-12.