CN108140144B

CN108140144B - Method and device for training neural network model and electronic equipment

Info

Publication number: CN108140144B
Application number: CN201680061886.8A
Authority: CN
Inventors: 陈理; 王淞; 范伟; 孙俊; 直井聪
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2021-06-01
Anticipated expiration: 2036-03-31
Also published as: KR102161902B1; CN108140144A; JP2019508803A; JP6601569B2; KR20180084969A; WO2017166155A1

Abstract

A method, a device and an electronic device for training a neural network model are provided. The method comprises the following steps: extracting a portion of the neural network model to form a neural network sub-model; training the neural network submodel to form an optimized neural network submodel; initializing each weight in the neural network model according to each weight in the optimized neural network submodel to form an initialized neural network model, wherein the initialized neural network model and the optimized neural network submodel have the same output characteristics; and adjusting each weight in the initialized neural network model based on a known training set. According to the method, the training time of the large-scale neural network can be shortened, and the over-fitting problem can be avoided.

Description

Method and device for training neural network model and electronic equipment

Technical Field

The present disclosure relates to the field of information processing technologies, and in particular, to a method and an apparatus for training a neural network model, and an electronic device.

Background

In recent years, classification methods based on Convolutional Neural Networks (CNN) have been highly successful in the field of handwritten character recognition.

The CNN model is a hierarchical model, and fig. 1 is a schematic diagram of the CNN model, and as shown in fig. 1, the CNN model is composed of an input layer 101, a plurality of hidden layers 102, and an output layer 103. The input layer 101 provides data to be processed corresponding to a sample to be identified, and when the sample to be identified is a gray image, the data to be processed is a two-dimensional matrix; the types of hidden layers 102 can be normal convolutional layers, relaxed convolutional layers, pooling layers, neuron layers, or fully-connected layers, etc., each hidden layer providing a specific operation to process data; the output layer 103 provides the final result of the model, and for CNN models used for classification, the output layer 103 outputs the probability that the sample to be identified belongs to each class.

It should be noted that the above background description is only for the sake of clarity and complete description of the technical solutions of the present invention and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the invention.

Disclosure of Invention

Many published experimental evidences show that the larger the scale of the CNN model is, the more accurate the recognition result of the sample to be recognized is, but the large-scale CNN model has the following problems in training: a) the larger the model, the easier it is to overfit; b) the larger the model, the longer the training time required.

The embodiment of the application provides a method and a device for training a neural network model, and electronic equipment, wherein a small-scale neural network model is trained, a large-scale neural network model is initialized by the small-scale neural network model, and the initialized large-scale neural network model is finely adjusted, so that the problems of overfitting, overlong training time and the like caused by direct training of the large-scale neural network can be avoided.

According to a first aspect of embodiments of the present application, there is provided a method for training a neural network model, which is used to determine weights in the neural network model, the method including:

extracting a portion of the neural network model to form a neural network sub-model;

training the neural network submodel to form an optimized neural network submodel;

initializing each weight in the neural network model according to each weight in the optimized neural network submodel to form an initialized neural network model, wherein the initialized neural network model and the optimized neural network submodel have the same output characteristics;

and adjusting each weight in the initialized neural network model based on a known training set.

According to a second aspect of embodiments herein, wherein extracting a portion of the neural network model comprises:

converting a relaxation convolutional layer in the neural network model into a common convolutional layer so as to convert the neural network model into a common convolutional neural network model; and

extracting part of neuron nodes in each hidden layer of the ordinary convolutional neural network model to form the neural network submodel.

According to a third aspect of embodiments herein, initializing each weight in the neural network model to form an initialized neural network model comprises:

initializing each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network submodel to form an initialized ordinary convolutional neural network model; and

and converting the common convolutional layer in the initialized common convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

According to a fourth aspect of embodiments herein, initializing each weight in the neural network model to form an initialized neural network model comprises:

initializing each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network submodel to form an initialized ordinary convolutional neural network model;

based on a known training set, adjusting each weight in the initialized common convolutional neural network model to form an adjusted common convolutional neural network model; and

and converting the common convolutional layer in the adjusted common convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

According to a fifth aspect of embodiments herein, initializing each weight in the ordinary convolutional neural network model to form an initialized ordinary convolutional neural network model comprises:

initializing the weight of the corresponding hidden layer in the ordinary convolutional neural network model according to the weight of each hidden layer in the optimized neural network submodel to form an initialized ordinary convolutional neural network model, wherein the output characteristics of each hidden layer of the initialized ordinary convolutional neural network model are the same as the output characteristics of each hidden layer of the optimized neural network submodel.

According to a sixth aspect of the embodiment of the present application, initializing, according to the weight of each hidden layer in the optimized neural network submodel, the weight of the corresponding hidden layer in the ordinary convolutional neural network model includes:

and multiplying each weight of the hidden layer in the optimized neural network submodel by a preset coefficient to serve as each weight of the corresponding hidden layer in the common convolutional neural network model.

According to a seventh aspect of the embodiments of the present application, there is provided an apparatus for training a neural network model, configured to determine weights in the neural network model, the apparatus including:

an extraction unit for extracting a part of the neural network model to form a neural network submodel;

a first training unit for training the neural network submodel to form an optimized neural network submodel;

the initialization unit initializes each weight value in the neural network model according to each weight value in the optimized neural network submodel to form an initialized neural network model, and the initialized neural network model and the optimized neural network submodel have the same output characteristics;

and the second training unit is used for adjusting each weight in the initialized neural network model based on a known training set.

According to an eighth aspect of embodiments of the present application, wherein the extraction unit includes:

a first conversion unit, configured to convert a relaxed convolutional layer in the neural network model into a normal convolutional layer, so as to convert the neural network model into a normal convolutional neural network model; and

an extracting subunit, configured to extract a part of the neuron nodes in each hidden layer of the general convolutional neural network model to form the neural network submodel.

According to a ninth aspect of the embodiments of the present application, wherein the initialization unit includes:

the first initialization subunit is used for initializing each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network submodel so as to form an initialized ordinary convolutional neural network model; and

a second conversion unit for converting the ordinary convolutional layer in the initialized ordinary convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

According to a tenth aspect of the embodiments of the present application, wherein the initialization unit includes:

the second initialization subunit initializes each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network submodel to form an initialized ordinary convolutional neural network model;

the third training unit is used for adjusting each weight in the initialized common convolutional neural network model based on a known training set to form an adjusted common convolutional neural network model; and

a third conversion unit, configured to convert the common convolutional layer in the adjusted common convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

According to the eleventh aspect of the embodiment of the application, the first initialization subunit initializes, according to the weight of each hidden layer in the optimized neural network sub-model, the weight of a corresponding hidden layer in the ordinary convolutional neural network model to form an initialized ordinary convolutional neural network model, wherein an output characteristic of each hidden layer of the initialized ordinary convolutional neural network model is the same as an output characteristic of each hidden layer of the optimized neural network sub-model.

According to the twelfth aspect of the embodiment of the present application, the first initialization subunit multiplies each weight of the hidden layer in the optimized neural network sub-model by a predetermined coefficient, and uses each weight of the corresponding hidden layer in the ordinary convolutional neural network model.

According to a thirteenth aspect of embodiments of the present application, there is provided an electronic device including the apparatus for training a neural network model according to any one of the seventh to twelfth aspects of the embodiments.

According to a fourteenth aspect of embodiments of the present application, there is provided a computer-readable program, wherein when the program is executed in an apparatus or an electronic device for training a neural network model, the program causes the apparatus or the electronic device for training a neural network model to execute the method for training a neural network model according to any one of the first to sixth aspects of the embodiments.

According to a fifteenth aspect of embodiments of the present application, there is provided a storage medium storing a computer readable program, wherein the storage medium stores the computer readable program of the fourteenth aspect of the above embodiments, and the computer readable program causes an apparatus or an electronic device for training a neural network model to execute the method for training a neural network model according to any one of the first to sixth aspects of the above embodiments.

The beneficial effects of the embodiment of the application are that: shortening the training time of the large-scale neural network and avoiding the over-fitting problem.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the scope of the terms of the appended claims.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.

Drawings

Elements and features described in one drawing or one implementation of an embodiment of the invention may be combined with elements and features shown in one or more other drawings or implementations. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views, and may be used to designate corresponding parts for use in more than one embodiment.

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic diagram of a CNN model;

FIG. 2 is a schematic view of a neural network model of example 1;

FIG. 3 is a schematic view of a method of training a neural network model according to embodiment 1;

FIG. 4 is a schematic view of a method of extracting a part of a neural network model of embodiment 1;

FIG. 5 is a schematic diagram of a general convolutional neural network model of example 1;

FIG. 6 is a schematic view showing a processing method of a relaxed convolution layer in example 1;

FIG. 7 is a schematic view showing a processing method of a normal convolutional layer of example 1;

FIG. 8 is a schematic view of a neural network submodel of example 1;

FIG. 9 is a schematic diagram of a method of initializing weights in a neural network model according to embodiment 1;

FIG. 10(A) is a schematic of the input layer and convolutional layer of an optimized neural network submodel;

FIG. 10(B) is a schematic diagram of the input layer and convolutional layer of the general convolutional neural network model after initialization;

FIG. 11(A) is a schematic diagram of the pooling layer and convolutional layer of an optimized neural network submodel;

FIG. 11(B) is a schematic diagram of the pooling layer and convolutional layer of the general convolutional neural network model after initialization;

FIG. 12(A) is a schematic of a portion of the pooling layer and convolutional layer of FIG. 11 (A);

FIG. 12(B) is a schematic of a portion of the pooling layer and convolutional layer of FIG. 11 (B);

FIG. 13(A) is a schematic diagram of a fully-connected layer and its preceding hidden layer of an optimized neural network submodel;

FIG. 13(B) is a schematic diagram of a fully-connected layer and its preceding hidden layer of the initialized general convolutional neural network model;

FIG. 14(A) is a schematic diagram of a portion of the fully connected layer of FIG. 13(A) and its preceding hidden layer;

FIG. 14(B) is a schematic diagram of a portion of the fully connected layer of FIG. 13(B) and its preceding hidden layer;

FIG. 15 is another schematic diagram of the method of initializing weights in the neural network model according to embodiment 1;

FIG. 16 is a schematic view of an apparatus for training a neural network model according to embodiment 2;

FIG. 17 is a schematic view of an extraction unit of embodiment 2;

FIG. 18 is a schematic diagram of an initialization unit of embodiment 2;

FIG. 19 is another schematic diagram of an initialization unit of embodiment 2;

fig. 20 is a schematic block diagram of a system configuration of an electronic apparatus 2000 of embodiment 3.

Detailed Description

The foregoing and other features of the invention will become apparent from the following description taken in conjunction with the accompanying drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the embodiments in which the principles of the invention may be employed, it being understood that the invention is not limited to the embodiments described, but, on the contrary, is intended to cover all modifications, variations, and equivalents falling within the scope of the appended claims. Various embodiments of the present invention will be described below with reference to the accompanying drawings. These embodiments are merely exemplary and are not intended to limit the present invention.

Example 1

Embodiment 1 of the present application provides a method for training a neural network model, which is used to determine each weight in the neural network model.

Fig. 2 is a schematic diagram of the neural network model of the present embodiment, and as shown in fig. 2, the neural network model 200 includes an input layer 201, a convolutional layer 202, a pooling layer 203, a relaxed convolutional layer 204, a fully-connected layer 205, and an output layer 206, wherein the convolutional layer 202, the pooling layer 203, the relaxed convolutional layer 204, and the fully-connected layer 205 are hidden layers.

In this embodiment, the input layer 201 may input the data to be recognized 2011; each of the convolutional layer 202, the pooling layer 203, the relaxed convolutional layer 204, the fully-connected layer 205 and the output layer 206 receives data output by the previous layer, and processes the data by using a weight corresponding to the current layer to generate data output by the current layer and output the data from a neuron node (neuron) of the current layer, the neuron nodes of the current layer are 2021-; in fig. 2, only the

neuron nodes

2021, 2024, 2031, 2034, 2041, 2046, 2051, 2058, 2061, and 20610 are labeled, and the other neuron nodes are not labeled.

As shown in fig. 2, when the neural network model 200 is used to recognize a handwritten digital image, the data 2011 to be recognized input by the input layer 201 may be the handwritten digital image, the data output by the neuron nodes of the convolutional layer 202, the pooling layer 203, and the relaxation convolutional layer 204 may be a feature map (feature map), and the data output by the neuron nodes of the fully-connected layer 205 and the output layer 206 may be numerical values. One of the numbers 0-9 may correspond to one of the categories 206 a. Therefore, from the data output by the output layer 206, the probability that the data to be identified 2011 belongs to each of the numbers 0 to 9 can be determined.

In the neural network model 200, the proper weight value is selected for each layer, so that the classification result output by the output layer 206 can be guaranteed to be accurate, wherein the weight value corresponding to each layer can be a matrix of m × n, and m and n are both natural numbers.

The method for training the neural network model of this embodiment is used to determine the weights corresponding to each layer in the neural network model.

In the following description of the present embodiment, the method for training a neural network model according to the present embodiment will be described by taking the neural network model 200 shown in fig. 2 as an example.

It should be noted that, in the present embodiment, the neural network model 200 has the relaxed convolutional layer 204, and therefore, the neural network model 200 belongs to a convolutional neural network model, of course, the neural network model 200 of the present embodiment may not have the relaxed convolutional layer 204, and the present embodiment does not limit this; in addition, the method for training the neural network model described in this embodiment is not only applicable to the convolutional neural network model, but also applicable to other neural network models.

Fig. 3 is a schematic diagram of a method for training a neural network model according to this embodiment, and as shown in fig. 3, the method includes:

s301, extracting a part of the neural network model to form a neural network submodel;

s302, training the neural network submodel to form an optimized neural network submodel;

s303, initializing each weight in the neural network model according to each weight in the optimized neural network submodel to form an initialized neural network model, wherein the initialized neural network model and the optimized neural network submodel have the same output characteristics;

s304, adjusting each weight in the initialized neural network model based on a known training set.

According to the embodiment of the application, the neural network submodel with the smaller scale is trained, the trained submodel initializes the neural network model with the larger scale, and then the neural network model with the larger scale is finely adjusted, so that compared with a method for directly training the neural network model with the larger scale, the method of the embodiment can avoid the problems of overfitting, overlong training time and the like.

Fig. 4 is a schematic diagram of a method for extracting a part of a neural network model according to the embodiment, and as shown in fig. 4, the method includes:

s401, converting a relaxation convolutional layer in the neural network model into a common convolutional layer so as to convert the neural network model into a common convolutional neural network model; and

s402, extracting part of neuron nodes in each hidden layer of the ordinary convolutional neural network model to form the neural network submodel.

In step S401, the relaxed convolutional layer 204 in the neural network model 200 of fig. 2 is converted into a normal convolutional layer, thereby converting the neural network model 200 from a convolutional neural network model into a normal convolutional neural network model.

FIG. 5 is a schematic diagram of the generic convolutional neural network model 500, in which the relaxed convolutional layer 204 is converted to a generic convolutional layer 504, and the other layers of the generic convolutional neural network model 500 are identical to the neural network model 200.

The data processing scheme of the normal convolutional layer 504 differs from the data processing scheme of the relaxed convolutional layer 204 in that, in the normal convolutional layer 504, data at different positions within the same neuron node participating in the convolution operation share one weight, and in the relaxed convolutional layer 204, data at different positions within the same neuron node participating in the convolution operation do not share any weight.

Fig. 6 is a schematic diagram of a processing manner of the relaxed convolutional layer 204 in this embodiment, as shown in fig. 6, P1 and P2 are different neuron nodes participating in a convolution operation, P11 and P14 are data at different positions in P1, P21 and P24 are data at different positions in P2, W11, W14, W21 and W24 are different weights, and T11 and T14 are data generated after the convolution operation, where the calculation manners of T11 and T14 are as shown in the following formulas (1) and (2):

fig. 6 and equations (1) and (2) show only the calculation methods of T11 and T14, and the calculation methods of T12 and T13 are similar to them, and this embodiment will not be described in detail.

As shown in fig. 6 and equations (1) and (2), the data P11 and P14 in the neuron node P1 correspond to independent weights W11 and W14, respectively, and the data P21 and P24 in the neuron node P2 correspond to independent weights W21 and W24, respectively, that is, data at different positions in the same neuron node do not share any weight.

Fig. 7 is a schematic diagram of the processing manner of the normal convolutional layer 504 of this embodiment, the meanings of P1, P2, P11, P14, P21, P24, T11, and T14 are the same as those of fig. 6, W1 and W2 are different weights, where the calculation manners of T11 and T14 are shown in the following formulas (3) and (4):

as shown in fig. 7 and equations (3) and (4), the weight W1 is shared by the data P11 and P14 at different positions in the neuron node P1, and the weight W2 is shared by the data P21 and P24 at different positions in the neuron node P2.

In step S401 of this embodiment, part of the weights in the relaxed convolutional layer 204 of the neural network model 200 may be deleted to reduce the number of the weights, so that data in the same neuron node participating in the convolution operation share one weight, thereby converting the relaxed convolutional layer 204 into the normal convolutional layer 504, and converting the neural network model 200 into the normal convolutional neural network model 500.

In step S402 of this embodiment, the neuron nodes of each hidden layer of the general convolutional neural network model 500 may be deleted according to a certain proportion, so as to obtain a neural network submodel, where the proportion of the neuron nodes deleted by each hidden layer may be the same or different.

Fig. 8 is a schematic diagram of a neural network submodel 800 of the present embodiment, as shown in fig. 8, on the basis of the ordinary convolutional neural network model 500 of fig. 5, the proportion of the neurons of each hidden layer deleted is 50%, so as to form the neural network submodel 800, where 801 of fig. 8 is a neuron node deleted from the ordinary convolutional neural network model 500, the input layer 201 and the output layer 206 of fig. 8 are respectively the same as the input layer 201 and the output layer 206 of fig. 5, and the convolutional layer 802, the pooling layer 803, the convolutional layer 804, and the fully-connected layer 805 of fig. 8 correspond to the convolutional layer 202, the pooling layer 203, the ordinary convolutional layer 504, and the fully-connected layer 205 of fig. 5, respectively.

In step S401 and step S402 of this embodiment, the neural network model 200 is first converted into a common convolutional neural network model, and then the deletion of the neuron node is performed on the common convolutional neural network model to obtain the neural network sub-model 800, where the purpose of converting the neural network model 200 into the common convolutional neural network model is to reduce the number of weights in the subsequently generated neural network sub-model 800 and avoid overfitting. However, the present embodiment is not limited to this, and if the neural network model 200 does not have the relaxed convolution layer 204, the deletion process of the neuron node may be directly performed on the neural network model 200.

In step S302 of this embodiment, the neural network submodel 800 may be trained according to a known training set to determine an optimized value of each weight therein, so as to train the neural network submodel 800 into an optimized neural network submodel. The method for training the neural network submodel 800 may refer to the prior art, and is not described in detail in this embodiment.

In step S303 of this embodiment, the optimized neural network submodel may be used to initialize each weight in the neural network model 200, and the initialized neural network model 200 has the same output characteristics as the optimized neural network submodel, thereby preventing the initialization process from introducing new errors.

Fig. 9 is a schematic diagram of a method for initializing each weight in the neural network model in the present embodiment, which is used to implement step S303. As shown in fig. 9, the method includes:

s901, initializing each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized ordinary convolutional neural network model;

s902, converting the common convolutional layer in the initialized common convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

In step S901 of this embodiment, the weights of the hidden layers in the ordinary convolutional neural network model 500 may be initialized according to the weights of the hidden layers in the optimized neural network sub-model to form an initialized ordinary convolutional neural network model, where output characteristics of the hidden layers of the initialized ordinary convolutional neural network model are the same as those of the hidden layers of the optimized neural network sub-model, for example, each weight of the hidden layer in the optimized neural network sub-model may be multiplied by a predetermined coefficient to serve as each weight of the corresponding hidden layer in the ordinary convolutional neural network model.

Next, a process of initializing each weight value will be described by taking the general convolutional neural network model 500 of fig. 5 as an example.

1. Initialization of weights for convolutional layer 202 as the first hidden layer

In fig. 5, convolutional layer 202 is connected to input layer 201, and convolutional layer 202 is the first hidden layer after input layer 201. The input data of convolutional layer 202 is the data to be identified of input layer 201, and the output data of each neuron node of convolutional layer 202 is obtained by convolving the data to be identified with the weight of convolutional layer 202.

Fig. 10(a) is a schematic diagram of the input layer 201 and the convolutional layer 802 of the optimized neural network submodel 800, and fig. 10(B) is a schematic diagram of the input layer 201 and the convolutional layer 202 of the initialized general convolutional neural network model 500.

As shown in fig. 10(a), in the neural network submodel 800, convolution of the input data with the weight K1 obtains a feature map a1 output by the neuron node 8021, and convolution of the input data with the weight K2 obtains a feature map a 2. As shown in fig. 10(B), each weight value of the convolutional layer 202 of the ordinary convolutional neural network model 500 is initialized by multiplying a weight value K1 by a predetermined coefficient L11 as a weight value corresponding to the

neuron nodes

2021, 2023 of the convolutional layer 202 of the ordinary convolutional neural network submodel 500, and multiplying a weight value K2 by a predetermined coefficient L12 as a weight value corresponding to the

neuron nodes

2022, 2024 of the convolutional layer 202 of the ordinary convolutional neural network submodel 500.

In this embodiment, the predetermined coefficients L11 and L12 may be both 1, and therefore, the feature maps output by the

neuron nodes

2021, 2022, 2023, and 2024 of the convolutional layer 202 of the ordinary convolutional neural network model 500 are a1, a2, A3, and a4, respectively, where a1 is A3 and a2 is a4, so that the convolutional layer 202 of the initialized ordinary convolutional neural network model 500 and the convolutional layer 802 of the optimized neural network submodel 800 have the same output characteristics.

Of course, in the present embodiment, L11 and L12 may have other values and may be different from each other.

2. Weight initialization for convolutional layer 504 as a non-first hidden layer

In fig. 5, the feature map output by each neuron node of convolutional layer 202 is used as input data of pooling layer 203, the feature map output by pooling layer 203 is used as input data of convolutional layer 504, and convolutional layer 504 is a non-first hidden layer.

Fig. 11(a) is a schematic diagram of the pooling layer 803 and the convolutional layer 804 of the optimized neural network submodel 800, and fig. 11(B) is a schematic diagram of the pooling layer 203 and the convolutional layer 504 of the initialized general convolutional neural network model 500.

As shown in fig. 11(a), the feature maps B1 and B2 output from the neuron nodes of the pooling layer 803 are used to generate feature maps C1 to C3 of the neuron nodes of the convolutional layer 804. In the pooling layer 803, a1 and a2 in fig. 10(a) and the corresponding weights are pooled to obtain B1 and B2, respectively.

As shown in fig. 11(B), the feature maps B1-B4 output by the neuron nodes of the pooling layer 203 are used to generate feature maps C1 '-C6' of the neuron nodes of the convolutional layer 504. Here, in the pooling layer 203, B1 to B4 are obtained by pooling a1 to a4 in fig. 10(B) and the corresponding weights, respectively, and each weight in the pooling layer 203 may be obtained by multiplying each weight in the pooling layer 803 by a predetermined coefficient, which may be, for example, 1, and thus, in fig. 11(B), B1 is equal to B3 and B2 is equal to B4, so that the pooling layer 203 of the initialized general convolutional neural network model 500 and the pooling layer 803 of the optimized neural network sub-model 800 have the same output characteristics.

Fig. 12(a) is a schematic diagram of a portion of pooling layer 803 and convolutional layer 804 of fig. 11(a), fig. 12(B) is a schematic diagram of a portion of pooling layer 203 and convolutional layer 504 of fig. 11(B), and in fig. 12(a) and 12(B), the corresponding weights are shown.

As shown in fig. 12(a), in the neural network submodel 800, B1 and B2 are convolved with the weights K3 and K4 to obtain a feature map C1 output by the neuron node 8041, and the convolution is shown in the following formula (5):

as shown in fig. 12(B), the weight value K3 is multiplied by a predetermined coefficient L21 to obtain K3 ', which is a weight value corresponding to the feature maps B1 and B3 in the convolutional layer 504 of the ordinary convolutional neural network sub-model 500, and the weight value K4 is multiplied by a predetermined coefficient L22 to obtain K4', which is a weight value corresponding to the feature maps B2 and B4 in the convolutional layer 504 of the ordinary convolutional neural network sub-model 500, so that each weight value of the convolutional layer 504 of the ordinary convolutional neural network model 500 is initialized.

In this embodiment, the predetermined coefficients L21 and L22 may be 1/2, and therefore, K3 ═ K3 ═ 1/2 and K4 ═ K4 ═ 1/2, so that the characteristic map C1' output by the neuron node 5041 may be expressed by the following formula (6):

as can be seen, C1' is C1, so the convolutional layer 504 of the initialized general convolutional neural network model 500 has the same output characteristics as the convolutional layer 804 of the optimized neural network submodel 800.

Of course, in the present embodiment, L21 and L22 may have other values and may be different from each other.

In the present embodiment, similar to the above method, the weights between B1-B4 and C2 'of fig. 11(B) are initialized using the weights between B1, B2 and C2 of fig. 11(a), and the weights between B1-B4 and C3' of fig. 11(B) are initialized using the weights between B1, B2 and C3 of fig. 11(a), so that C2 '═ C2, and C3' ═ C3 can be adopted.

In the present embodiment, similar to the above method, the weight values between B1-B4 and C4 'in fig. 11(B) may be initialized using the weight values between B1, B2 and C1 in fig. 11(a), and as shown in fig. 12(B), K3' and K4 'may be generated from K3 and K4, so that the feature map C4' output by the neuron node 5044 can be generated from B1-B4, K3 'and K4', and C1 'is equal to C4'. Similarly, the weights between B1-B4 and C5 'in fig. 11(B) may be initialized using the weights between B1, B2 and C2 in fig. 11(a), and the weights between B1-B4 and C6' in fig. 11(B) may be initialized using the weights between B1, B2 and C3 in fig. 11(a), so that C2 '═ C5', C3 '═ C6'.

In this embodiment, if there are other convolutional layers after convolutional layer 504, the method of initializing each weight in the other convolutional layers is similar to the method of initializing each weight in convolutional layer 504.

3. Weight initialization for fully connected layer 205

In fig. 5, the fully-connected layer 205 may be located after all of the convolutional layers, and the output layer 206 may be connected after the fully-connected layer 205, i.e., the fully-connected layer 205 is the last hidden layer, or other fully-connected layers may be connected after the fully-connected layer 205, i.e., the fully-connected layer 205 is the non-last hidden layer.

In the case where the fully-connected layer 205 is a non-last hidden layer, the initialization method of each weight of the fully-connected layer 205 may refer to the initialization method of the convolutional layer 504 located after the first hidden layer, and, in the fully-connected layer 205, the convolution operation is replaced with a multiplication operation, because the fully-connected layer 205 can be regarded as a convolutional layer having a weight of 1 × 1.

Next, a method of initializing each weight of the fully-connected layer 205 when the fully-connected layer 205 is the last hidden layer will be described.

The fully connected layer 205, which is the last hidden layer, has as many neuron nodes as the number of classes of the output layer 206. Therefore, the number of neuron nodes of the fully-connected layer 805 of the optimized neural network submodel 800 and the number of neuron nodes of the fully-connected layer 205 of the initialized general convolutional neural network model 500 are the same, but the number of input data of the two may be different.

Fig. 13(a) is a schematic diagram of the fully-connected layer 805 and its preceding hidden layer of the optimized neural network submodel 800, and fig. 13(B) is a schematic diagram of the fully-connected layer 205 and its preceding hidden layer of the initialized general convolutional neural network model 500.

As shown in fig. 13(a), the data F1, F2 output from each neuron node of the preceding hidden layer is used to generate output data E1 to E3 of each neuron node of the fully-connected layer 805. As shown in FIG. 13(B), the data F1-F4 output by each neuron node of the previous hidden layer is used to generate output data E1 '-E3' of each neuron node of the fully-connected layer 205. The data F1-F4 may be in the form of floating point numbers.

In fig. 13(a), the number of data output by the previous hidden layer in fig. 13(a) is also half of the number of data output by the previous hidden layer in fig. 13(B) because of the operation of the previous extraction submodel, the number of neuron nodes of the previous hidden layer of which is reduced by half compared to the number of the previous hidden layer of fig. 13 (B). In fig. 13(B), after the previous hidden layer is initialized, the following conditions may be satisfied, for example, F1 ═ F3, and F2 ═ F4.

Fig. 14(a) is a schematic diagram of a portion of the fully-connected layer 805 and its preceding hidden layer of fig. 13(a), fig. 14(B) is a schematic diagram of a portion of the fully-connected layer 205 and its preceding hidden layer of fig. 13(B), and in fig. 14(a) and 14(B), the corresponding weights are shown.

As shown in fig. 14(a), in the neural network submodel 800, F1 and F2 are multiplied by the weights K5 and K6 to obtain data E1 output by the neuron node 8051, and the multiplication is shown in the following formula (7):

E1＝F1*K5+F2*K6 (7)

as shown in fig. 14(B), the weight K5 is multiplied by a predetermined coefficient L31 to obtain K5 ', which is the weight corresponding to F1 and F3 in the fully-connected layer 205 of the ordinary convolutional neural network model 500, and the weight K6 is multiplied by a predetermined coefficient L32 to obtain K5', which is the weight corresponding to F2 and F4 in the fully-connected layer 205 of the ordinary convolutional neural network model 500, so that each weight of the fully-connected layer 205 of the ordinary convolutional neural network model 500 is initialized.

In the present embodiment, both the predetermined coefficients L31 and L32 may be 1/2, and thus, K5 ═ K5 ═ 1/2, and K6 ═ K6 ═ 1/2, and thus, data E1' output by the neuron node 2051 may be as shown in the following formula (8):

as can be seen, E1 ═ E1, and thus the fully-connected layer 205 of the initialized generic convolutional neural network model 500 has the same output characteristics as the fully-connected layer 805 of the optimized neural network submodel 800.

Of course, in the present embodiment, L31 and L32 may have other values and may be different from each other.

In the present embodiment, similar to the above method, the weights between F1-F4 and E2 'of fig. 13(B) may be initialized using the weights between F1, F2 and E2 of fig. 13(a), and the weights between F1-F4 and E3' of fig. 13(B) may be initialized using the weights between F1, F2 and E3 of fig. 13(a), whereby E2 '═ E2, E3' ═ E3 may be adopted.

According to the operations 1-3 described above, each weight value in the general convolutional neural network model 500 shown in fig. 5 can be initialized. Of course, the general convolutional neural network model 500 is not limited to the result shown in fig. 5, and the general convolutional neural network model 500 may also have other hidden layers, and for the initialization method of the other hidden layers, reference may be made to the operations 1 to 3 above.

In step S902 of the present embodiment, the process of converting the normal convolutional layer into the relaxed convolutional layer may be the reverse process of step S401, that is, in step S902, the weight values in the normal convolutional layer may be copied in multiple copies so that data at different positions in the same neuron node participating in the convolution operation correspond to different weight values, for example, the weight value W1 in fig. 7 is copied into W11 and W14, and the weight value W2 is copied into W21 and W24, thereby converting the normal convolutional layer into the relaxed convolutional layer.

Fig. 15 is another schematic diagram of the method for initializing each weight in the neural network model according to the present embodiment, and is used to implement step S303. As shown in fig. 15, the method includes:

s1501, based on a known training set, adjusting each weight in the initialized common convolutional neural network model to form an adjusted common convolutional neural network model; and

s1502, converting the common convolutional layer in the adjusted common convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

The method of fig. 15 is different from the method of fig. 9 in that step S1501 is added to the method of fig. 15, that is, after the initialized normal convolutional neural network model is obtained, the normal convolutional neural network model is adjusted, and thus, the workload in the subsequent adjustment in step S304 can be reduced. Regarding the method for adjusting in step S1501, reference may be made to a method for adjusting a neural network model in the prior art, and this embodiment will not be described again.

In this embodiment, the processing method of converting the normal buildup layer into the relaxed buildup layer in step S1502 may be the same as the processing method of step S902.

In the method shown in fig. 9 and fig. 15, in step S901, each weight in the ordinary convolutional neural network model is initialized, and then the ordinary convolutional layer is converted into a relaxed convolutional layer through step S902 or steps S1501 and S1502, so that each weight of the neural network model 200 is initialized; however, the present embodiment is not limited thereto, and if the neural network model 200 does not have the relaxation convolution layer, the weights of the neural network model 200 may be initialized directly in step S901 without step S902 or steps S1501 and S1502.

In step S304 of this embodiment, each weight in the initialized neural network model may be adjusted based on a known training set, and the adjustment manner may refer to the prior art, and this embodiment is not described in detail.

According to the embodiment, a small-scale neural network model is trained, a large-scale neural network model is initialized by the small-scale neural network model, and the initialized large-scale neural network model is finely adjusted.

Example 2

Embodiment 2 provides an apparatus for training a neural network model, corresponding to the method of embodiment 1.

Fig. 16 is a schematic diagram of an apparatus for training a neural network model according to embodiment 2, and as shown in fig. 16, the apparatus 1600 includes: an extraction unit 1601, a first training unit 1602, an initialization unit 1603, and a second training unit 1604.

The extracting unit 1601 is configured to extract a part of the neural network model to form a neural network submodel; the first training unit 1602 is configured to train the neural network sub-model to form an optimized neural network sub-model; the initializing unit 1603 initializes each weight in the neural network model according to each weight in the optimized neural network submodel to form an initialized neural network model, and the initialized neural network model and the optimized neural network submodel have the same output characteristics; the second training unit 1604 adjusts the weights in the initialized neural network model based on the known training set.

Fig. 17 is a schematic diagram of an extraction unit 1601 of the present embodiment 2, and as shown in fig. 17, the extraction unit 1601 includes a first conversion unit 1701 and an extraction sub-unit 1702.

The first conversion unit 1701 is configured to convert a relaxed convolutional layer in the neural network model into a normal convolutional layer, so as to convert the neural network model into a normal convolutional neural network model; the extracting sub-unit 1702 is configured to extract a part of the neuron nodes in each hidden layer of the general convolutional neural network model to form the neural network sub-model.

Fig. 18 is a schematic diagram of the initialization unit 1603 of the embodiment 2, and as shown in fig. 18, the initialization unit 1603 includes a first initialization sub-unit 1801 and a second conversion unit 1802.

The first initialization subunit 1801 is configured to initialize each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network sub-model, so as to form an initialized ordinary convolutional neural network model; a second conversion unit 1802 is configured to convert the ordinary convolutional layers in the initialized ordinary convolutional neural network model into relaxed convolutional layers to form the initialized neural network model.

In this embodiment, the first initialization subunit 1801 may initialize a weight of a corresponding hidden layer in the ordinary convolutional neural network model according to the weight of each hidden layer in the optimized neural network sub-model to form an initialized ordinary convolutional neural network model, where an output characteristic of each hidden layer of the initialized ordinary convolutional neural network model is the same as an output characteristic of each hidden layer of the optimized neural network sub-model, for example, the first initialization subunit 1801 may multiply each weight of a hidden layer in the optimized neural network sub-model by a predetermined coefficient to serve as each weight of a corresponding hidden layer in the ordinary convolutional neural network model.

In this embodiment, the second conversion unit 1802 may convert the ordinary convolutional neural network model into a relaxed convolutional neural network model using an operation reverse to that of the first conversion unit 1701.

Fig. 19 is another schematic diagram of the initialization unit 1603 of the embodiment 2, and as shown in fig. 19, the initialization unit 1603 includes: a second initialization sub-unit 1901, a third training unit 1902, and a third transformation unit 1903.

The second initializing subunit 1901 initializes each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized ordinary convolutional neural network model; the third training unit 1902 adjusts each weight in the initialized common convolutional neural network model based on a known training set to form an adjusted common convolutional neural network model; the third converting unit 1903 is configured to convert the ordinary convolutional layer in the adjusted ordinary convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

In this embodiment, the second initialization sub-unit 1901 may be processed in the same manner as the first initialization sub-unit 1801; the third training unit 1902 may refer to the prior art for adjusting each weight value in the initialized common convolutional neural network model; the second conversion unit 1802 may be referred to in a manner that the third conversion unit 1903 converts the normal convolutional layer into the relaxed convolutional layer.

Example 3

Embodiment 3 of the present application provides an electronic device, which includes the apparatus for training a neural network model according to embodiment 2.

Fig. 20 is a schematic block diagram of a system configuration of the electronic apparatus 2000 according to the embodiment of the present invention. As shown in fig. 20, the electronic device 2000 may include a central processor 2100 and a memory 2140; the memory 2140 is coupled to the central processor 2100. Notably, this diagram is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

In one embodiment, the functionality of the means for training the neural network model may be integrated into the central processor 2100. Wherein, the central processor 2100 may be configured to:

extracting a portion of the neural network model to form a neural network sub-model; training the neural network submodel to form an optimized neural network submodel; initializing each weight in the neural network model according to each weight in the optimized neural network submodel to form an initialized neural network model, wherein the initialized neural network model and the optimized neural network submodel have the same output characteristics; and adjusting each weight in the initialized neural network model based on a known training set.

Wherein, the central processor 2100 may be further configured to: converting a relaxation convolutional layer in the neural network model into a common convolutional layer so as to convert the neural network model into a common convolutional neural network model; extracting part of neuron nodes in each hidden layer of the ordinary convolutional neural network model to form the neural network submodel.

Wherein, the central processor 2100 may be further configured to: initializing each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network submodel to form an initialized ordinary convolutional neural network model; and converting the common convolutional layer in the initialized common convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

Wherein, the central processor 2100 may be further configured to: initializing each weight in the ordinary convolutional neural network model according to each weight in the optimized neural network submodel to form an initialized ordinary convolutional neural network model; based on a known training set, adjusting each weight in the initialized common convolutional neural network model to form an adjusted common convolutional neural network model; and converting the common convolutional layer in the adjusted common convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.

Wherein, the central processor 2100 may be further configured to: initializing the weight of the corresponding hidden layer in the ordinary convolutional neural network model according to the weight of each hidden layer in the optimized neural network submodel to form an initialized ordinary convolutional neural network model, wherein the output characteristics of each hidden layer of the initialized ordinary convolutional neural network model are the same as the output characteristics of each hidden layer of the optimized neural network submodel.

Wherein, the central processor 2100 may be further configured to: and multiplying each weight of the hidden layer in the optimized neural network submodel by a preset coefficient to serve as each weight of the corresponding hidden layer in the common convolutional neural network model.

In another embodiment, the device for training the neural network model may be configured separately from the central processor 2100, for example, the device for training the neural network model may be configured as a chip connected to the central processor 2100, and the function of the device for training the neural network model is realized by the control of the central processor.

As shown in fig. 20, the electronic device 2000 may further include: a communication module 2110, an input unit 2120, an audio processing unit 2130, a display 2160, and a power supply 2170. It is noted that the electronic device 2000 does not necessarily include all of the components shown in fig. 20; furthermore, the electronic device 2000 may also include components not shown in fig. 20, which may be referred to in the prior art.

As shown in fig. 20, a central processor 2100, also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 2100 receiving input and controlling operation of various components of the electronic device 2000.

The memory 2140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device, and may store a program for executing related information. And the central processor 2100 may execute the program stored in the memory 2140 to realize information storage or processing, or the like. The functions of other parts are similar to the prior art and are not described in detail here. The various components of the electronic device 2000 may be implemented in dedicated hardware, firmware, software, or combinations thereof, without departing from the scope of the invention.

Embodiments of the present application also provide a computer-readable program, where when the program is executed in an information processing apparatus or an electronic device, the program causes the information processing apparatus or the electronic device to execute the method for training a neural network model according to embodiment 1.

The embodiment of the present application further provides a storage medium storing a computer readable program, where the storage medium stores the computer readable program, and the computer readable program enables an information processing apparatus or an electronic device to execute the method for training a neural network model according to embodiment 1.

The apparatus for training a neural network model described in connection with the embodiments of the present invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams (e.g., a.. unit, etc.) illustrated in fig. 16-19 may correspond to either a respective software module or a respective hardware module of a computer program flow. These software modules may correspond to the respective steps shown in embodiment 1. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the apparatus (e.g., mobile terminal) employs a relatively large capacity MEGA-SIM card or a large capacity flash memory device, the software module may be stored in the MEGA-SIM card or the large capacity flash memory device.

One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams (e.g., a.. unit, etc.) described with respect to fig. 16-19 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof, for performing the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 16-19 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.

The present application has been described in conjunction with specific embodiments, but it should be understood by those skilled in the art that these descriptions are intended to be illustrative, and not limiting. Various modifications and adaptations of the present application may occur to those skilled in the art based on the teachings herein and are within the scope of the present application.

Claims

1. A method of training a neural network model for handwritten digital image recognition to determine weights in the neural network model, the method comprising:

training the neural network sub-model using a training image of the handwritten digital image to form an optimized neural network sub-model;

and adjusting each weight in the initialized neural network model based on the known training set of the handwritten digital image.

2. The method of claim 1, wherein extracting a portion of a neural network model comprises:

3. The method of claim 2, wherein initializing weights in the neural network model to form an initialized neural network model comprises:

4. The method of claim 2, wherein initializing weights in the neural network model to form an initialized neural network model comprises:

based on the known training set of the handwritten digital image, adjusting each weight in the initialized common convolutional neural network model to form an adjusted common convolutional neural network model; and

5. The method of claim 3, wherein initializing weights in the ordinary convolutional neural network model to form an initialized ordinary convolutional neural network model comprises:

6. The method of claim 5, wherein initializing weights of corresponding hidden layers in the common convolutional neural network model according to the weights of the hidden layers in the optimized neural network submodel comprises:

7. An apparatus for training a neural network model for handwritten digital image recognition to determine weights in the neural network model, the apparatus comprising:

a first training unit for training the neural network sub-model using a training image of a handwritten digital image to form an optimized neural network sub-model;

and the second training unit is used for adjusting each weight in the initialized neural network model based on the known training set of the handwritten digital images.

8. The apparatus of claim 7, wherein the extraction unit comprises:

9. The apparatus of claim 8, wherein the initialization unit comprises:

10. The apparatus of claim 8, wherein the initialization unit comprises:

the third training unit is used for adjusting each weight in the initialized common convolutional neural network model based on a known training set of the handwritten digital image to form an adjusted common convolutional neural network model; and

11. The apparatus of claim 9, wherein the first initialization subunit

12. The apparatus of claim 11, wherein the first initialization subunit

13. An electronic device comprising the apparatus of any of claims 7-12.