WO2017166155A1 - Method and device for training neural network model, and electronic device - Google Patents

Method and device for training neural network model, and electronic device Download PDF

Info

Publication number
WO2017166155A1
WO2017166155A1 PCT/CN2016/077975 CN2016077975W WO2017166155A1 WO 2017166155 A1 WO2017166155 A1 WO 2017166155A1 CN 2016077975 W CN2016077975 W CN 2016077975W WO 2017166155 A1 WO2017166155 A1 WO 2017166155A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
network model
model
layer
training
Prior art date
Application number
PCT/CN2016/077975
Other languages
French (fr)
Chinese (zh)
Inventor
陈理
王淞
范伟
孙俊
直井聪
Original Assignee
富士通株式会社
陈理
王淞
范伟
孙俊
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社, 陈理, 王淞, 范伟, 孙俊 filed Critical 富士通株式会社
Priority to PCT/CN2016/077975 priority Critical patent/WO2017166155A1/en
Priority to JP2018539870A priority patent/JP6601569B2/en
Priority to CN201680061886.8A priority patent/CN108140144B/en
Priority to KR1020187017577A priority patent/KR102161902B1/en
Publication of WO2017166155A1 publication Critical patent/WO2017166155A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the field of information processing technologies, and in particular, to a method, device, and electronic device for training a neural network model.
  • CNN Convolutional Neural Network
  • the CNN model is a hierarchical model.
  • FIG. 1 is a schematic diagram of the CNN model.
  • the CNN model is composed of an input layer 101, a plurality of hidden layers 102, and an output layer 103.
  • the input layer 101 provides data to be processed corresponding to the sample to be identified.
  • the sample to be identified is a grayscale image
  • the data to be processed is a two-dimensional matrix; the type of the hidden layer 102 may be a common convolution layer or a relaxed convolution.
  • output layer 103 provides the final result of the model, for the CNN model used for classification, The output layer 103 outputs the probability that the sample to be identified belongs to each class.
  • the large-scale CNN model has the following problems during training: a) The larger the model, the easier it is to overfit. ; b) The larger the model, the longer the training time required.
  • the embodiment of the present application provides a method, a device, and an electronic device for training a neural network model, training a small-scale neural network model, and initializing a large-scale neural network model by a small-scale neural network model, and finally initializing The subsequent large-scale neural network model is fine-tuned, thereby avoiding problems such as over-fitting and excessive training time caused by direct training of large-scale neural networks.
  • a method for training a neural network model for Determining the weights in the neural network model, the method comprising:
  • Each weight in the initialized neural network model is adjusted based on a known training set.
  • extracting a portion of the neural network model comprises:
  • a portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.
  • initializing each weight in the neural network model to form an initialization neural network model includes:
  • the normal convolutional layer in the initialized general convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
  • initializing each weight in the neural network model to form an initialization neural network model includes:
  • the normal convolutional layer in the adjusted ordinary convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
  • the initializing the common convolutional neural network model by initializing each weight in the common convolutional neural network model includes:
  • the weights of corresponding hidden layers in the common convolutional neural network model are performed according to weights of respective hidden layers in the optimized neural network sub-model Initialization includes:
  • the weights of the hidden layers in the optimized neural network sub-model are multiplied by predetermined coefficients as the weights of the corresponding hidden layers in the common convolutional neural network model.
  • an apparatus for training a neural network model for determining weights in a neural network model comprising:
  • An extracting unit for extracting a portion of the neural network model to form a neural network sub-model
  • a first training unit for training the neural network sub-model to form an optimized neural network sub-model
  • An initialization unit that initializes each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initialization neural network model and the optimization Neural network submodels have the same output characteristics;
  • a second training unit that adjusts weights in the initialized neural network model based on a known training set.
  • the extracting unit includes:
  • a first conversion unit for converting a relaxed convolutional layer in the neural network model into a common convolutional layer to convert the neural network model into a general convolutional neural network model
  • An extraction subunit is configured to extract a partial neuron node in each hidden layer of the common convolutional neural network model to form the neural network submodel.
  • the initialization unit includes:
  • a first initialization subunit configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model
  • a second transformation unit for converting a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
  • the initializing unit includes:
  • a second initialization subunit initializing the Pu according to each weight in the optimized neural network submodel
  • the weights in the convolutional neural network model are used to form an initial general convolutional neural network model
  • a third training unit that adjusts weights in the initialized general convolutional neural network model based on a known training set to form an adjusted general convolutional neural network model
  • a third transformation unit for converting a common convolutional layer in the adjusted general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
  • the first initializing subunit corresponds to the weight of each hidden layer in the optimized neural network submodel, to the common convolutional neural network model
  • the weights of the hidden layers are initialized to form an initial general convolutional neural network model, wherein the output characteristics of the implicit layers of the normalized convolutional neural network model are initialized with each of the optimized neural network submodels
  • the output characteristics of the hidden layer are the same.
  • the first initializing subunit multiplies weights of the hidden layer in the optimized neural network submodel by a predetermined coefficient as the ordinary convolution The weights of the corresponding hidden layers in the neural network model.
  • an electronic device comprising the apparatus for training a neural network model according to any one of the seventh to twelfth aspects of the embodiments.
  • a fourteenth aspect of the embodiments of the present application there is provided a computer readable program, wherein when the program is executed in a device or an electronic device that trains a neural network model, the program causes the pair of neural networks
  • the apparatus or apparatus for training the model performs the method of training the neural network model described in any one of the first to sixth aspects of the above embodiments.
  • a storage medium storing a computer readable program, wherein the storage medium stores the computer readable program of the fourteenth aspect of the above embodiment, the computer readable program A device or an electronic device that trains a neural network model performs the method of training a neural network model as described in any one of the first to sixth aspects of the above embodiments.
  • the beneficial effects of the embodiments of the present application are: shortening the training time of large-scale neural networks and avoiding over-fitting problems.
  • Figure 1 is a schematic diagram of a CNN model
  • FIG. 2 is a schematic diagram of a neural network model of Embodiment 1;
  • FIG. 3 is a schematic diagram of a method of training a neural network model of Embodiment 1;
  • FIG. 4 is a schematic diagram of a method of extracting a portion of a neural network model of Embodiment 1;
  • FIG. 5 is a schematic diagram of a conventional convolutional neural network model of Embodiment 1;
  • Figure 6 is a schematic view showing a treatment mode of the slack convolution layer of the first embodiment
  • FIG. 7 is a schematic diagram showing a processing manner of a general convolution layer of Embodiment 1;
  • FIG. 8 is a schematic diagram of a neural network sub-model of Embodiment 1;
  • FIG. 9 is a schematic diagram of a method for initializing weights in a neural network model in Embodiment 1;
  • Figure 10 (A) is a schematic diagram of an input layer and a convolution layer of the optimized neural network sub-model
  • Figure 10 (B) is a schematic diagram of an input layer and a convolution layer of an ordinary convolutional neural network model after initialization;
  • Figure 11 (A) is a schematic diagram of a pooled layer and a convolutional layer of the optimized neural network sub-model
  • Figure 11 (B) is a schematic diagram of a pooling layer and a convolution layer of an ordinary convolutional neural network model after initialization;
  • Figure 12 (A) is a partial schematic view of the pooling layer and the convolution layer of Figure 11 (A);
  • Figure 12 (B) is a partial schematic view of the pooling layer and the convolution layer of Figure 11 (B);
  • Figure 13 (A) is a schematic diagram of the fully connected layer of the optimized neural network sub-model and its previous hidden layer
  • Figure 13 (B) is a schematic diagram of the fully connected layer of the normal convolutional neural network model after initialization and its previous hidden layer;
  • Figure 14 (A) is a partial schematic view of the fully connected layer of Figure 13 (A) and its previous hidden layer;
  • Figure 14 (B) is a partial schematic view of the fully connected layer of Figure 13 (B) and its previous hidden layer;
  • FIG. 15 is another schematic diagram of a method for initializing weights in a neural network model in Embodiment 1;
  • FIG. 16 is a schematic diagram of an apparatus for training a neural network model of Embodiment 2;
  • Figure 17 is a schematic illustration of the extraction unit of Embodiment 2.
  • Figure 18 is a schematic diagram of an initialization unit of Embodiment 2;
  • Figure 19 is another schematic diagram of the initialization unit of Embodiment 2.
  • 20 is a schematic block diagram showing the system configuration of the electronic device 2000 of the third embodiment.
  • Embodiment 1 of the present application provides a method of training a neural network model for determining weights in a neural network model.
  • the neural network model 200 includes an input layer 201, a convolution layer 202, a pooling layer 203, a relaxed convolution layer 204, and a fully connected layer 205. And the output layer 206, wherein the convolution layer 202, the pooling layer 203, the relaxed convolution layer 204, and the fully connected layer 205 are all hidden layers.
  • the input layer 201 can input the data to be identified 2011; each of the convolutional layer 202, the pooling layer 203, the relaxed convolutional layer 204, the fully connected layer 205, and the output layer 206 receives the upper layer.
  • the output data is processed by the weight corresponding to the layer to generate the data output by the layer and output from the neuron node (neuron) of the layer, and the neuron nodes of each layer are respectively 2021-2024, 2031-2034, 2041-2046, 2051-2058, and 2061-20610, and, based on the data output by the neuron node of the output layer 206, the probability that the data to be identified 2011 belongs to each category 206a can be determined; further, in FIG. 2, only the marker Neuron nodes 2021, 2024, 2031, 2034, 2041, 2046, 2051, 2058, 2061, and 20610 are excluded, and other neuron nodes are not marked.
  • the data to be identified 2011 input by the input layer 201 may be a handwritten digital image, a convolution layer 202, a pooling layer 203, and a slack convolution layer.
  • the data output by the neuron node of 204 may be a feature map, and the data output by the neuron nodes of the fully connected layer 205 and the output layer 206 may be numerical values.
  • a number in the numbers 0-9 may correspond to a category 206a. Therefore, based on the data output by the output layer 206, the probability that the data to be identified 2011 belongs to each of 0-9 can be determined.
  • the weights corresponding to each layer are selected to ensure that the classification result output by the output layer 206 is accurate, wherein the weight corresponding to each layer may be a matrix of m*n, m And n are both natural numbers.
  • the method for training the neural network model in this embodiment is for determining the weight corresponding to each layer in the neural network model.
  • the neural network model 200 has a relaxed convolution layer 204. Therefore, the neural network model 200 belongs to a convolutional neural network model.
  • the neural network model 200 of the present embodiment may not have The relaxation convolution layer 204 is not limited in this embodiment; and the method for training the neural network model described in this embodiment is applicable not only to the convolutional neural network model but also to other neural network models.
  • FIG. 3 is a schematic diagram of a method for training a neural network model according to the embodiment. As shown in FIG. 3, the method includes:
  • the trained sub-model is used to initialize a large-scale neural network model, and then fine-tuning a large-scale neural network model.
  • the method of the present embodiment can avoid problems such as overfitting and excessive training time as compared with a method of directly training a large-scale neural network model.
  • FIG. 4 is a schematic diagram of a method of extracting a part of a neural network model of the embodiment, as shown in FIG. 4, the method includes:
  • step S401 the relaxed convolutional layer 204 in the neural network model 200 of FIG. 2 is converted into a normal convolutional layer, thereby transforming the neural network model 200 from a convolutional neural network model to a general convolutional neural network model.
  • FIG. 5 is a schematic diagram of the conventional convolutional neural network model 500 in which the relaxed convolutional layer 204 is transformed into a common convolutional layer 504, and the other layers of the conventional convolutional neural network model 500 are identical to the neural network model 200.
  • the data processing manner of the ordinary convolutional layer 504 is different from the data processing manner of the slack convolutional layer 204 in that, in the ordinary convolutional layer 504, data sharing at different locations within the same neuron node participating in the convolution operation is performed.
  • a weight, and in the relaxed convolutional layer 204, data at different locations within the same neuron node participating in the convolution operation does not share any of the weights.
  • FIG. 6 is a schematic diagram of the processing manner of the slack convolution layer 204 of the present embodiment.
  • P1 and P2 are different neuron nodes participating in the convolution operation
  • P11 and P14 are different positions in P1.
  • Data, P21, P24 are data of different positions in P2, W11, W14, W21, W24 are different weights
  • T11 and T14 are data generated after convolution operation, wherein T11 and T14 are calculated as follows ( 1), (2):
  • the data P11 and P14 in the neuron node P1 correspond to independent weights W11 and W14, respectively, and the data P21 and P24 in the neuron node P2 respectively correspond to independent rights.
  • Value W21, W24 that is, data at different locations within the same neuron node does not share any weight.
  • T11 and T14 are calculated as shown in the following equations (3) and (4):
  • the data P11 and P14 at different positions in the neuron node P1 share the weight W1
  • the data P21 and P24 at different positions in the neuron node P2 share the weight W2.
  • step S401 of the present embodiment part of the weights in the relaxed convolution layer 204 of the neural network model 200 may be deleted to reduce the number of weights, thereby causing the same neuron node to participate in the convolution operation
  • the data shares a weight, thereby transforming the relaxed convolutional layer 204 into a normal convolutional layer 504 to convert the neural network model 200 into a normal convolutional neural network model 500.
  • step S402 of the present embodiment the neuron nodes of each hidden layer of the ordinary convolutional neural network model 500 can be deleted according to a certain ratio, thereby obtaining a neural network sub-model, in which each hidden layer is deleted.
  • the proportion of neuron nodes can be the same or different.
  • FIG. 8 is a schematic diagram of the neural network sub-model 800 of the present embodiment.
  • the proportion of neurons in each hidden layer is deleted. 50%, thereby forming a neural network sub-model 800, wherein 801 of FIG. 8 is a neuron node deleted from the ordinary convolutional neural network model 500, and the input layer 201 and the output layer 206 of FIG. 8 are respectively associated with FIG.
  • the input layer 201 is the same as the output layer 206, and the convolutional layer 802, the pooling layer 803, the convolutional layer 804, and the fully connected layer 805 of FIG. 8 and the convolutional layer 202, the pooling layer 203, and the ordinary convolutional layer 504 of FIG.
  • the fully connected layers 205 correspond to each.
  • the neural network model 200 is first converted into a common convolutional neural network model, and then the neuron node is deleted from the common convolutional neural network model to obtain the neural network sub-model 800.
  • the purpose of transforming the neural network model 200 into a common convolutional neural network model is to reduce the number of weights in the subsequently generated neural network sub-model 800 and avoid over-fitting.
  • the present embodiment is not limited thereto. If the neural network model 200 does not have the slack convolution layer 204, the neuron node 200 can be directly deleted by the neural network model 200.
  • the neural network sub-model 800 can be trained according to a known training set to determine an optimized value of each weight thereof, thereby training the neural network sub-model 800 as an optimized god. Through the network sub-model.
  • the method for training the neural network sub-model 800 can refer to the prior art, and details are not described in this embodiment.
  • the optimized neural network submodel can be used to initialize the weights in the neural network model 200, and the initialized neural network model 200 has the same neural network submodel as the optimized neural network.
  • FIG. 9 is a schematic diagram of a method for initializing each weight in the neural network model according to the embodiment, for implementing step S303. As shown in FIG. 9, the method includes:
  • S901 Initialize, according to each weight in the optimized neural network submodel, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;
  • the weights of the corresponding hidden layers in the common convolutional neural network model 500 may be initialized according to the weights of the hidden layers in the optimized neural network submodel.
  • the weights of the hidden layers in the optimized neural network submodel are multiplied by a predetermined coefficient as the weights of the corresponding hidden layers in the general convolutional neural network model.
  • convolutional layer 202 is coupled to input layer 201, which is the first hidden layer after input layer 201.
  • the input data of the convolutional layer 202 is the data to be identified of the input layer 201, and the data to be identified is convoluted with the weight of the convolutional layer 202 to obtain the output data of each neuron node of the convolutional layer 202.
  • FIG. 10(A) is a schematic diagram of the input layer 201 and the convolution layer 802 of the optimized neural network sub-model 800
  • FIG. 10(B) is the input layer 201 and the convolution layer of the initialized normal convolutional neural network model 500.
  • the input data is convoluted with the weight K1 to obtain the feature map A1 output by the neuron node 8021, and the input data is convoluted with the weight K2 to obtain the feature map A2.
  • the weight K1 is multiplied by a predetermined coefficient L11 as a common convolutional neural network submodel 500 volume.
  • the weights corresponding to the neuron nodes 2021, 2023 of the layer 202 multiply the weight K2 by a predetermined coefficient L12, which corresponds to the neuron nodes 2022, 2024 of the convolutional layer 202 of the common convolutional neural network submodel 500.
  • the weights are thus initialized for each weight of the convolutional layer 202 of the normal convolutional neural network model 500.
  • the predetermined coefficients L11 and L12 may both be 1, and therefore, the feature maps output by the neuron nodes 2021, 2022, 2023, and 2024 of the convolutional layer 202 of the common convolutional neural network model 500 are respectively A1.
  • L11, L12 may have other values, and may be different from each other.
  • the feature map output by each neuron node of the convolution layer 202 is used as input data of the pooling layer 203
  • the feature map output by the pooling layer 203 is used as input data of the convolution layer 504
  • the convolution layer 504 is a non-first hidden layer.
  • FIG. 11(A) is a schematic diagram of the pooled layer 803 and the convolution layer 804 of the optimized neural network submodel 800
  • FIG. 11(B) is the pooled layer 203 and volume of the initialized normal convolutional neural network model 500.
  • the feature maps B1, B2 outputted by the respective neuron nodes of the pooling layer 803 are used to generate feature maps C1-C3 of the respective neuron nodes of the convolutional layer 804.
  • A1 and A2 of FIG. 10(A) and the corresponding weights are respectively pooled to obtain B1 and B2.
  • the feature maps B1-B4 outputted by the respective neuron nodes of the pooling layer 203 are used to generate feature maps C1'-C6' of the respective neuron nodes of the convolutional layer 504.
  • A1-A4 of FIG. 10(B) and the corresponding weights are respectively pooled to obtain B1-B4, and each weight in the pooling layer 203 may be pooled.
  • the pooling layer 203 of the product neural network model 500 has the same output characteristics as the pooling layer 803 of the optimized neural network submodel 800.
  • FIG. 12(A) is a partial schematic view of the pooling layer 803 and the convolution layer 804 of FIG. 11(A)
  • FIG. 12(B) is a partial schematic view of the pooling layer 203 and the convolution layer 504 of FIG. 11(B).
  • Figures 12(A) and 12(B) the corresponding weights are shown.
  • the weight K3 is multiplied by a predetermined coefficient L21 to obtain K3' as the weight corresponding to the feature maps B1, B3 in the convolutional layer 504 of the ordinary convolutional neural network submodel 500.
  • the weights of the convolutional layer 504 of the model 500 are initialized.
  • C1' C1
  • the convolutional layer 504 of the normalized convolutional neural network model 500 after initialization has the same output characteristics as the convolutional layer 804 of the optimized neural network submodel 800.
  • L21, L22 may have other values, and may be different from each other.
  • the weight between B1, B4, and C2' of FIG. 11(B) can be used in a similar manner to the above, using the weights between B1, B2, and C2 of FIG. 11(A).
  • the weight between B1, B4 and C4' of Fig. 11(B) can be used in a similar manner to the above, using the weights between B1, B2 and C1 of Fig. 11(A).
  • the weight between B1, B4, and C5' of FIG. 11(B) can be initialized using the weights between B1, B2, and C2 of FIG. 11(A), using FIG. 11(A).
  • the initialization methods of the weights in the other convolutional layers are similar to the initialization methods for the weights in the convolutional layer 504.
  • the fully connected layer 205 may be located after all the convolutional layers, and the output layer 206 may be connected to the rear of the fully connected layer 205, that is, the fully connected layer 205 is the last hidden layer, or the fully connected layer.
  • Other fully connected layers may be connected to the rear of 205, i.e., the fully connected layer 205 is a non-final hidden layer.
  • the initialization method of each weight of the fully-connected layer 205 may refer to the initialization method of the convolution layer 504 located after the first hidden layer, and at the fully-connected layer.
  • the convolution operation is replaced by a multiplication operation because the fully-connected layer 205 can be regarded as a convolutional layer having a weight of 1 ⁇ 1.
  • the fully connected layer 205 which is the last hidden layer, has as many neuron nodes as the number of classes of the output layer 206. Therefore, for the fully connected layer 805 of the optimized neural network submodel 800 and the fully connected layer 205 of the initialized normal convolutional neural network model 500, the number of neuron nodes is the same, but the input data of the two The number can vary.
  • FIG. 13(A) is a schematic diagram of the fully connected layer 805 of the optimized neural network submodel 800 and its previous hidden layer
  • FIG. 13(B) is the fully connected layer 205 of the initialized normal convolutional neural network model 500.
  • a schematic diagram of the previous hidden layer is a schematic diagram of the previous hidden layer.
  • the data F1, F2 outputted by the neuron nodes of the previous hidden layer are used to generate the output data E1-E3 of the respective neuron nodes of the fully connected layer 805.
  • the data F1-F4 outputted by the respective neuron nodes of the previous hidden layer are used to generate the output data E1'-E3' of the respective neuron nodes of the fully connected layer 205.
  • the data F1-F4 may be in the form of a floating point number.
  • the number of neuron nodes of the previous hidden layer is reduced by half compared with the number of the previous hidden layer of FIG. 13(B) due to the operation of the previous extraction submodel.
  • the number of data outputted by the previous hidden layer in Fig. 13(A) is also half the number of data outputted by the previous hidden layer in Fig. 13(B).
  • Figure 14 (A) is a partial schematic view of the fully-connected layer 805 of Figure 13 (A) and its previous hidden layer
  • Figure 14 (B) is the fully-connected layer 205 of Figure 13 (B) and its previous hidden layer
  • F1 and F2 are multiplied by weights K5 and K6 to obtain data E1 outputted by the neuron node 8051, and the multiplication is as shown in the following equation (7):
  • the weight K5 is multiplied by a predetermined coefficient L31 to obtain K5', which is the weight corresponding to F1 and F3 in the fully connected layer 205 of the ordinary convolutional neural network model 500.
  • the value K6 is multiplied by a predetermined coefficient L32 to obtain K5' as the weight corresponding to F2 and F4 in the fully connected layer 205 of the ordinary convolutional neural network model 500, thereby the fully connected layer of the ordinary convolutional neural network model 500.
  • the weights of 205 are initialized.
  • E1' E1
  • the fully connected layer 205 of the initialized normal convolutional neural network model 500 has the same output characteristics as the fully connected layer 805 of the optimized neural network submodel 800.
  • L31 and L32 may have other values, and may be different from each other.
  • the weight between F1, F4 and E2' of Fig. 13(B) can be used in a similar manner to the above, using the weights between F1, F2 and E2 of Fig. 13(A).
  • the weights in the ordinary convolutional neural network model 500 shown in FIG. 5 can be initialized.
  • the general convolutional neural network model 500 is not limited to the result shown in FIG. 5.
  • the ordinary convolutional neural network model 500 may have other hidden layers.
  • the process of converting the normal convolution layer into the slack convolution layer may be the reverse process of step S401, that is, in step S902, the weights in the ordinary convolution layer may be copied in multiple copies.
  • the data of different positions in the same neuron node participating in the convolution operation are corresponding to different weights. For example, the weight W1 in FIG. 7 is copied into W11 and W14, and the weight W2 is copied into W21 and W24. Thereby, the ordinary convolution layer is converted into a relaxed convolution layer.
  • FIG. 15 is another schematic diagram of a method for initializing each weight in the neural network model in the embodiment, for implementing step S303. As shown in Figure 15, the method includes:
  • S901 Initialize, according to each weight in the optimized neural network submodel, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;
  • step S1501 is added to the method of FIG. 15, that is, after the normal convolutional neural network model is initialized, the ordinary convolutional neural network model is adjusted by Thereby, the amount of work when the adjustment is performed in step S304 can be alleviated.
  • step S1501 reference may be made to the method for adjusting the neural network model in the prior art, which is not described in this embodiment.
  • the processing method of converting the ordinary convolution layer into the slack convolution layer in step S1502 may be the same as the processing method of step S902.
  • the weights in the normal convolutional neural network model are initialized in step S901, and the ordinary convolution layer is converted into a slack volume by step S902 or steps S1501 and S1502.
  • the layers are layered to initialize the weights of the neural network model 200; however, the embodiment is not limited thereto, and if the neural network model 200 does not have a relaxed convolutional layer, the neural network model 200 may be directly used in step S901.
  • the weights are initialized without step S902 or steps S1501 and S1502.
  • the weights in the initialization neural network model may be adjusted based on the known training set, and the manner of the adjustment may be referred to the prior art, which is not described in detail in this embodiment.
  • a small-scale neural network model is trained, and a large-scale neural network model is initialized by a small-scale neural network model, and finally a large-scale neural network model after initialization is fine-tuned due to a small-scale network.
  • Most of the training work has been completed, so large-scale networks only need to be fine-tuned for several rounds to converge, thereby avoiding over-fitting and excessive training time for direct training of large-scale neural networks. problem.
  • Embodiment 2 provides an apparatus for training a neural network model, which corresponds to the method of Embodiment 1.
  • the apparatus 1600 includes: an extracting unit 1601, a first training unit 1602, and an initializing unit 1603. And a second training unit 1604.
  • the extracting unit 1601 is configured to extract a part of the neural network model to form a neural network sub-model; the first training unit 1602 is configured to train the neural network sub-model to form an optimized neural network sub-model; and the initializing unit 1603 Initializing each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initializing the neural network model and the optimized neural network
  • the models have the same output characteristics; the second training unit 1604 adjusts the weights in the initialized neural network model based on the known training set.
  • the extracting unit 1601 includes a first converting unit 1701 and an extracting subunit 1702.
  • the first conversion unit 1701 is configured to convert the relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model; and the extraction subunit 1702 is used to A portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.
  • FIG. 18 is a schematic diagram of the initialization unit 1603 of the second embodiment. As shown in FIG. 18, the initialization unit 1603 includes a first initialization subunit 1801 and a second conversion unit 1802.
  • the first initialization subunit 1801 is configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model;
  • the second transforming unit 1802 is configured to convert the normal convolutional layer in the initialized normal convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.
  • the first initialization subunit 1801 may determine the weight of the corresponding hidden layer in the common convolutional neural network model according to the weight of each hidden layer in the optimized neural network submodel. Initializing to form an initialization common convolutional neural network model, wherein the output characteristics of the implicit layers of the initialized general convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network submodel For example, the first initialization subunit 1801 may multiply each weight of the hidden layer in the optimized neural network submodel by a predetermined coefficient as a corresponding hidden layer in the common convolutional neural network model. The weights of each.
  • the second converting unit 1802 may convert the ordinary convolutional neural network model into a relaxed convolutional neural network model by using an operation opposite to that of the first converting unit 1701.
  • FIG. 19 is another schematic diagram of the initializing unit 1603 of the second embodiment.
  • the initializing unit 1603 includes: a second initializing subunit 1901, a third training unit 1902, and a third converting unit. 1903.
  • the second initialization subunit 1901 initializes each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initial common convolutional neural network model;
  • the training unit 1902 adjusts each weight in the initialized normal convolutional neural network model to form an adjusted ordinary convolutional neural network model based on the known training set;
  • the third transformation unit 1903 is configured to use the adjusted normal volume
  • the ordinary convolutional layer in the neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
  • the processing manner of the second initialization subunit 1901 may be the same as the processing manner of the first initialization subunit 1801; and the manner in which the third training unit 1902 adjusts the weights in the normal convolutional neural network model.
  • the third conversion unit 1903 may refer to the second conversion unit 1802 by converting the ordinary convolution layer into a relaxed convolution layer.
  • a small-scale neural network model is trained, and a large-scale neural network model is initialized by a small-scale neural network model, and finally a large-scale neural network model after initialization is fine-tuned due to a small-scale network.
  • Most of the training work has been completed, so large-scale networks only need to be fine-tuned for several rounds to converge, thereby avoiding over-fitting and excessive training time for direct training of large-scale neural networks. problem.
  • Embodiment 3 of the present application provides an electronic device including the device for training a neural network model as described in Embodiment 2.
  • FIG. 20 is a schematic block diagram showing the system configuration of an electronic device 2000 according to an embodiment of the present invention.
  • the electronic device 2000 can include a central processor 2100 and a memory 2140; the memory 2140 is coupled to the central processor 2100.
  • the figure is exemplary; other types of structures may be used in addition to or in place of the structure to implement telecommunications functions or other functions.
  • the functionality of the device that trains the neural network model can be integrated into the central processor 2100.
  • the central processing unit 2100 can be configured to:
  • the neural network model has the same output characteristics as the optimized neural network sub-model; and the weights in the initialized neural network model are adjusted based on the known training set.
  • the central processing unit 2100 may be further configured to: convert the relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model; extract the A part of the neuron nodes in each hidden layer of the general convolutional neural network model to form the neural network sub-model.
  • the central processing unit 2100 may be further configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized common convolutional neural network. a model; transforming a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.
  • the central processing unit 2100 may be further configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized common convolutional neural network.
  • Model based on the known training set, adjusting the weights in the initial general convolutional neural network model to form an adjusted ordinary convolutional neural network model; and the ordinary volume in the adjusted ordinary convolutional neural network model
  • the laminate is converted into a relaxed convolutional layer to form the initialized neural network model.
  • the central processing unit 2100 may be further configured to: perform weights of corresponding hidden layers in the common convolutional neural network model according to weights of respective hidden layers in the optimized neural network sub-model Initializing to form an initialization common convolutional neural network model, wherein the output characteristics of the implicit layers of the initialized normal convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network submodel.
  • the central processing unit 2100 may be further configured to: multiply each weight of the hidden layer in the optimized neural network sub-model by a predetermined coefficient as a corresponding implicit in the common convolutional neural network model Contains the weights of the layers.
  • the apparatus for training the neural network model may be configured separately from the central processing unit 2100.
  • the apparatus for training the neural network model may be configured as a chip connected to the central processing unit 2100 through the central processing unit. Control to implement the functionality of the device that trains the neural network model.
  • the electronic device 2000 may further include: a communication module 2110, an input unit 2120, an audio processing unit 2130, a display 2160, and a power source 2170. It should be noted that the electronic device 2000 does not have to include all the components shown in FIG. 20; in addition, the electronic device 2000 may further include components not shown in FIG. 20, and reference may be made to the prior art.
  • central processor 2100 also sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device that receives input and controls each of electronic devices 2000. The operation of the part.
  • the memory 2140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable device, and may store a program for executing related information. . And the central processing unit 2100 can execute the program stored by the memory 2140 to implement information storage or processing and the like. The functions of other components are similar to those of the existing ones and will not be described here.
  • the various components of electronic device 2000 may be implemented by special purpose hardware, firmware, software, or a combination thereof without departing from the scope of the invention.
  • the embodiment of the present application further provides a computer readable program, wherein the program causes the information processing device or the electronic device to perform the pair of nerves described in Embodiment 1 when the program is executed in an information processing device or an electronic device The method of training the network model.
  • the embodiment of the present application further provides a storage medium storing a computer readable program, wherein the storage medium stores the computer readable program, wherein the computer readable program causes the information processing device or the electronic device to perform the embodiment 1 A method of training a neural network model.
  • the apparatus for training a neural network model described in connection with an embodiment of the present invention may be directly embodied as hardware, a software module executed by a processor, or a combination of both.
  • one or more of the functional block diagrams shown in Figures 16-19 and/or one or more combinations of functional block diagrams may correspond to a computer program flow
  • Each software module can also correspond to each hardware module.
  • These software modules may correspond to the respective steps shown in Embodiment 1, respectively.
  • These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).
  • FPGA Field Programmable Gate Array
  • the software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
  • a storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC.
  • the software module can be stored in the memory of the mobile terminal or in a memory card that can be inserted into the mobile terminal.
  • the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.
  • One or more of the functional block diagrams described with respect to Figures 16-19 and/or one or more combinations of functional block diagrams may be implemented to perform the functions described herein.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • One or more of the functional blocks described with respect to Figures 16-19 and/or one or more combinations of functional blocks may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors One or more microprocessors in conjunction with DSP communication or any other such configuration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A method and device for training a neural network model, and an electronic device. The method comprises: extracting a part of a neural network model to form a neural network sub-model; training the neural network sub-model to form an optimized neural network sub-model; initializing weights in the neural network model according to weights in the optimized neural network sub-model to form an initialized neural network model, the initialized neural network model and the optimized neural network sub-model having the same output features; and adjusting the weights in the initialized neural network model on the basis of a known training set. The method can shorten the large-scale neural network training time and avoid the over-fitting problem.

Description

一种对神经网络模型进行训练的方法、装置及电子设备Method, device and electronic device for training neural network model 技术领域Technical field
本申请涉及信息处理技术领域,特别涉及一种对神经网络模型进行训练的方法、装置及电子设备。The present application relates to the field of information processing technologies, and in particular, to a method, device, and electronic device for training a neural network model.
背景技术Background technique
近年来,基于卷积神经网络(Convolutional Neural Network,CNN)的分类方法在手写字符识别领域取得了巨大的成功。In recent years, the classification method based on Convolutional Neural Network (CNN) has achieved great success in the field of handwritten character recognition.
CNN模型是一种层级模型,图1为CNN模型的示意图,如图1所示,CNN模型由一个输入层101、若干个隐含层102和一个输出层103组成。其中,输入层101提供待识别样本对应的待处理数据,当待识别样本为灰度图像时,该待处理数据是二维矩阵;隐含层102的类型可以是普通卷积层、松弛卷积层、池化层、神经元层或全连接层等,每一种隐含层提供一种特定的操作来处理数据;输出层103提供模型的最终结果,对于用于分类的CNN模型而言,输出层103输出待识别样本属于每一类的概率。The CNN model is a hierarchical model. FIG. 1 is a schematic diagram of the CNN model. As shown in FIG. 1, the CNN model is composed of an input layer 101, a plurality of hidden layers 102, and an output layer 103. The input layer 101 provides data to be processed corresponding to the sample to be identified. When the sample to be identified is a grayscale image, the data to be processed is a two-dimensional matrix; the type of the hidden layer 102 may be a common convolution layer or a relaxed convolution. Layers, pooling layers, neuron layers, or fully connected layers, each of which provides a specific operation to process the data; output layer 103 provides the final result of the model, for the CNN model used for classification, The output layer 103 outputs the probability that the sample to be identified belongs to each class.
应该注意,上面对技术背景的介绍只是为了方便对本发明的技术方案进行清楚、完整的说明,并方便本领域技术人员的理解而阐述的。不能仅仅因为这些方案在本发明的背景技术部分进行了阐述而认为上述技术方案为本领域技术人员所公知。It should be noted that the above description of the technical background is only for the purpose of facilitating a clear and complete description of the technical solutions of the present invention, and is convenient for understanding by those skilled in the art. The above technical solutions are not considered to be well known to those skilled in the art simply because these aspects are set forth in the background section of the present invention.
发明内容Summary of the invention
许多公开的实验证据表明,CNN模型的规模越大,对待识别样本的识别结果越准确,但是,大规模的CNN模型在训练时会存在如下的问题:a)模型越大,越容易过拟合;b)模型越大,所需要的训练时间越长。Many published experimental evidences show that the larger the size of the CNN model, the more accurate the recognition result of the sample to be identified. However, the large-scale CNN model has the following problems during training: a) The larger the model, the easier it is to overfit. ; b) The larger the model, the longer the training time required.
本申请实施例提供一种对神经网络模型进行训练的方法、装置及电子设备,训练小规模的神经网络模型,并由小规模的神经网络模型对大规模的神经网络模型进行初始化,最后对初始化后的大规模的神经网络模型进行微调,由此,能够避免对大规模的神经网络进行直接训练所产生的过拟合和训练时间过长等问题。The embodiment of the present application provides a method, a device, and an electronic device for training a neural network model, training a small-scale neural network model, and initializing a large-scale neural network model by a small-scale neural network model, and finally initializing The subsequent large-scale neural network model is fine-tuned, thereby avoiding problems such as over-fitting and excessive training time caused by direct training of large-scale neural networks.
根据本申请实施例的第一方面,提供一种对神经网络模型进行训练的方法,用于 确定神经网络模型中的各权值,该方法包括:According to a first aspect of embodiments of the present application, a method for training a neural network model is provided for Determining the weights in the neural network model, the method comprising:
提取神经网络模型的一部分,以形成神经网络子模型;Extracting a portion of the neural network model to form a neural network submodel;
对所述神经网络子模型进行训练,以形成优化的神经网络子模型;Training the neural network sub-model to form an optimized neural network sub-model;
根据所述优化的神经网络子模型中的各权值,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型,并且,所述初始化神经网络模型与所述优化的神经网络子模型具有相同的输出特性;Initializing each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initializing the neural network model and the optimized neural network The model has the same output characteristics;
基于已知训练集,对所述初始化神经网络模型中的各权值进行调整。Each weight in the initialized neural network model is adjusted based on a known training set.
根据本申请实施例的第二方面,其中,提取神经网络模型的一部分包括:According to a second aspect of the embodiments of the present application, wherein extracting a portion of the neural network model comprises:
将所述神经网络模型中的松弛卷积层转化为普通卷积层,以将所述神经网络模型转化为普通卷积神经网络模型;以及Converting the relaxed convolutional layer in the neural network model to a common convolutional layer to transform the neural network model into a common convolutional neural network model;
提取所述普通卷积神经网络模型的每一个隐含层中的部分神经元节点,以形成所述神经网络子模型。A portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.
根据本申请实施例的第三方面,其中,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型包括:According to a third aspect of the embodiments of the present application, wherein initializing each weight in the neural network model to form an initialization neural network model includes:
根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;以及Initializing each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initial common convolutional neural network model;
将所述初始化普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。The normal convolutional layer in the initialized general convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
根据本申请实施例的第四方面,其中,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型包括:According to a fourth aspect of the embodiments of the present application, wherein initializing each weight in the neural network model to form an initialization neural network model includes:
根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;Initializing, according to each weight in the optimized neural network sub-model, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;
基于已知训练集,对初始化普通卷积神经网络模型中的各权值进行调整,以形成调整后普通卷积神经网络模型;以及Adjusting the weights in the initial general convolutional neural network model based on the known training set to form an adjusted ordinary convolutional neural network model;
将所述调整后普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。The normal convolutional layer in the adjusted ordinary convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
根据本申请实施例的第五方面,其中,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型包括:According to a fifth aspect of the embodiments of the present application, the initializing the common convolutional neural network model by initializing each weight in the common convolutional neural network model includes:
根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络 模型中对应的隐含层的权值进行初始化,以形成初始化普通卷积神经网络模型,其中,所述初始化普通卷积神经网络模型的各隐含层的输出特性与所述优化的神经网络子模型的各隐含层的输出特性相同。Generating the general convolutional neural network according to the weights of the hidden layers in the optimized neural network sub-model The weights of the corresponding hidden layers in the model are initialized to form an initial general convolutional neural network model, wherein the output characteristics of the implicit layers of the normalized convolutional neural network model are initialized with the optimized neural network The output characteristics of each hidden layer of the model are the same.
根据本申请实施例的第六方面,其中,根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络模型中对应的隐含层的权值进行初始化包括:According to a sixth aspect of the embodiments of the present application, the weights of corresponding hidden layers in the common convolutional neural network model are performed according to weights of respective hidden layers in the optimized neural network sub-model Initialization includes:
将所述优化的神经网络子模型中的隐含层的各权值乘以预定的系数,作为所述普通卷积神经网络模型中的对应的隐含层的各权值。The weights of the hidden layers in the optimized neural network sub-model are multiplied by predetermined coefficients as the weights of the corresponding hidden layers in the common convolutional neural network model.
根据本申请实施例的第七方面,提供一种对神经网络模型进行训练的装置,用于确定神经网络模型中的各权值,该装置包括:According to a seventh aspect of the embodiments of the present application, there is provided an apparatus for training a neural network model for determining weights in a neural network model, the apparatus comprising:
提取单元,其用于提取神经网络模型的一部分,以形成神经网络子模型;An extracting unit for extracting a portion of the neural network model to form a neural network sub-model;
第一训练单元,其用于对所述神经网络子模型进行训练,以形成优化的神经网络子模型;a first training unit for training the neural network sub-model to form an optimized neural network sub-model;
初始化单元,其根据所述优化的神经网络子模型中的各权值,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型,并且,所述初始化神经网络模型与所述优化的神经网络子模型具有相同的输出特性;An initialization unit that initializes each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initialization neural network model and the optimization Neural network submodels have the same output characteristics;
第二训练单元,其基于已知训练集,对所述初始化神经网络模型中的各权值进行调整。A second training unit that adjusts weights in the initialized neural network model based on a known training set.
根据本申请实施例的第八方面,其中,所述提取单元包括:According to an eighth aspect of the embodiments of the present application, the extracting unit includes:
第一转化单元,其用于将所述神经网络模型中的松弛卷积层转化为普通卷积层,以将所述神经网络模型转化为普通卷积神经网络模型;以及a first conversion unit for converting a relaxed convolutional layer in the neural network model into a common convolutional layer to convert the neural network model into a general convolutional neural network model;
提取子单元,其用于提取所述普通卷积神经网络模型的每一个隐含层中的部分神经元节点,以形成所述神经网络子模型。An extraction subunit is configured to extract a partial neuron node in each hidden layer of the common convolutional neural network model to form the neural network submodel.
根据本申请实施例的第九方面,其中,所述初始化单元包括:According to a ninth aspect of the embodiments of the present application, the initialization unit includes:
第一初始化子单元,其用于根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;以及a first initialization subunit, configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model;
第二转化单元,其用于将所述初始化普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。a second transformation unit for converting a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
根据本申请实施例的第十方面,其中,所述初始化单元包括:According to a tenth aspect of the embodiments of the present application, the initializing unit includes:
第二初始化子单元,根据所述优化的神经网络子模型中的各权值,初始化所述普 通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;a second initialization subunit, initializing the Pu according to each weight in the optimized neural network submodel The weights in the convolutional neural network model are used to form an initial general convolutional neural network model;
第三训练单元,其基于已知训练集,对初始化普通卷积神经网络模型中的各权值进行调整,以形成调整后普通卷积神经网络模型;以及a third training unit that adjusts weights in the initialized general convolutional neural network model based on a known training set to form an adjusted general convolutional neural network model;
第三转化单元,其用于将所述调整后普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。a third transformation unit for converting a common convolutional layer in the adjusted general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
根据本申请实施例的第十一方面,其中,所述第一初始化子单元根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络模型中对应的隐含层的权值进行初始化,以形成初始化普通卷积神经网络模型,其中,所述初始化普通卷积神经网络模型的各隐含层的输出特性与所述优化的神经网络子模型的各隐含层的输出特性相同。According to an eleventh aspect of the embodiments of the present application, the first initializing subunit corresponds to the weight of each hidden layer in the optimized neural network submodel, to the common convolutional neural network model The weights of the hidden layers are initialized to form an initial general convolutional neural network model, wherein the output characteristics of the implicit layers of the normalized convolutional neural network model are initialized with each of the optimized neural network submodels The output characteristics of the hidden layer are the same.
根据本申请实施例的第十二方面,其中,所述第一初始化子单元将所述优化的神经网络子模型中的隐含层的各权值乘以预定的系数,作为所述普通卷积神经网络模型中的对应的隐含层的各权值。According to a twelfth aspect of the embodiments of the present application, the first initializing subunit multiplies weights of the hidden layer in the optimized neural network submodel by a predetermined coefficient as the ordinary convolution The weights of the corresponding hidden layers in the neural network model.
根据本申请实施例的第十三方面,提供一种电子设备,包括上述实施例第七至第十二方面的任一方面所述的对神经网络模型进行训练的装置。According to a thirteenth aspect of the embodiments of the present application, there is provided an electronic device comprising the apparatus for training a neural network model according to any one of the seventh to twelfth aspects of the embodiments.
根据本申请实施例的第十四方面,提供一种计算机可读程序,其中当在对神经网络模型进行训练的装置或电子设备中执行所述程序时,所述程序使得在所述对神经网络模型进行训练的装置或电子设备执行上述实施例第一至第六方面的任一项方面所述的对神经网络模型进行训练的方法。According to a fourteenth aspect of the embodiments of the present application, there is provided a computer readable program, wherein when the program is executed in a device or an electronic device that trains a neural network model, the program causes the pair of neural networks The apparatus or apparatus for training the model performs the method of training the neural network model described in any one of the first to sixth aspects of the above embodiments.
根据本申请实施例的第十五方面,提供一种存储有计算机可读程序的存储介质,其中所述存储介质存储上述实施例第十四方面的计算机可读程序,所述计算机可读程序使得对神经网络模型进行训练的装置或电子设备执行上述实施例第一至第六方面的任一项方面所述的对神经网络模型进行训练的方法。According to a fifteenth aspect of the embodiments of the present application, there is provided a storage medium storing a computer readable program, wherein the storage medium stores the computer readable program of the fourteenth aspect of the above embodiment, the computer readable program A device or an electronic device that trains a neural network model performs the method of training a neural network model as described in any one of the first to sixth aspects of the above embodiments.
本申请实施例的有益效果在于:缩短大规模神经网络的训练时间并避免过拟合问题。The beneficial effects of the embodiments of the present application are: shortening the training time of large-scale neural networks and avoiding over-fitting problems.
参照后文的说明和附图,详细公开了本发明的特定实施方式,指明了本发明的原理可以被采用的方式。应该理解,本发明的实施方式在范围上并不因而受到限制。在所附权利要求的条款的范围内,本发明的实施方式包括许多改变、修改和等同。Specific embodiments of the present invention are disclosed in detail with reference to the following description and the drawings, in which <RTIgt; It should be understood that the embodiments of the invention are not limited in scope. The embodiments of the present invention include many variations, modifications, and equivalents within the scope of the appended claims.
针对一种实施方式描述和/或示出的特征可以以相同或类似的方式在一个或更多 个其它实施方式中使用,与其它实施方式中的特征相组合,或替代其它实施方式中的特征。Features described and/or illustrated with respect to one embodiment may be in one or more in the same or similar manner Used in other embodiments, in combination with, or in lieu of, features in other embodiments.
应该强调,术语“包括/包含”在本文使用时指特征、整件、步骤或组件的存在,但并不排除一个或更多个其它特征、整件、步骤或组件的存在或附加。It should be emphasized that the term "comprising" or "comprises" or "comprising" or "comprising" or "comprising" or "comprising" or "comprises"
附图说明DRAWINGS
在本发明实施例的一个附图或一种实施方式中描述的元素和特征可以与一个或更多个其它附图或实施方式中示出的元素和特征相结合。此外,在附图中,类似的标号表示几个附图中对应的部件,并可用于指示多于一种实施方式中使用的对应部件。The elements and features described in one of the figures or one embodiment of the embodiments of the invention may be combined with the elements and features illustrated in one or more other figures or embodiments. In the accompanying drawings, like reference numerals refer to the
所包括的附图用来提供对本发明实施例的进一步的理解,其构成了说明书的一部分,用于例示本发明的实施方式,并与文字描述一起来阐释本发明的原理。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。在附图中:The accompanying drawings are included to provide a further understanding of the embodiments of the invention Obviously, the drawings in the following description are only some of the embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without any inventive labor. In the drawing:
图1是CNN模型的示意图;Figure 1 is a schematic diagram of a CNN model;
图2是实施例1的神经网络模型的一个示意图;2 is a schematic diagram of a neural network model of Embodiment 1;
图3是实施例1的对神经网络模型进行训练的方法的一个示意图;3 is a schematic diagram of a method of training a neural network model of Embodiment 1;
图4是实施例1的提取神经网络模型的一部分的方法的一个示意图;4 is a schematic diagram of a method of extracting a portion of a neural network model of Embodiment 1;
图5是实施例1的普通卷积神经网络模型的一个示意图;5 is a schematic diagram of a conventional convolutional neural network model of Embodiment 1;
图6是实施例1的松弛卷积层的处理方式的一个示意图;Figure 6 is a schematic view showing a treatment mode of the slack convolution layer of the first embodiment;
图7是实施例1的普通卷积层的处理方式的一个示意图;7 is a schematic diagram showing a processing manner of a general convolution layer of Embodiment 1;
图8是实施例1的神经网络子模型的一个示意图;8 is a schematic diagram of a neural network sub-model of Embodiment 1;
图9是实施例1对神经网络模型中的各权值进行初始化的方法的一个示意图;9 is a schematic diagram of a method for initializing weights in a neural network model in Embodiment 1;
图10(A)是优化后的神经网络子模型的输入层和卷积层的示意图;Figure 10 (A) is a schematic diagram of an input layer and a convolution layer of the optimized neural network sub-model;
图10(B)是初始化后的普通卷积神经网络模型的输入层和卷积层的示意图;Figure 10 (B) is a schematic diagram of an input layer and a convolution layer of an ordinary convolutional neural network model after initialization;
图11(A)是优化后的神经网络子模型的池化层和卷积层的示意图;Figure 11 (A) is a schematic diagram of a pooled layer and a convolutional layer of the optimized neural network sub-model;
图11(B)是初始化后的普通卷积神经网络模型的池化层和卷积层的示意图;Figure 11 (B) is a schematic diagram of a pooling layer and a convolution layer of an ordinary convolutional neural network model after initialization;
图12(A)是图11(A)的池化层和卷积层的一部分示意图;Figure 12 (A) is a partial schematic view of the pooling layer and the convolution layer of Figure 11 (A);
图12(B)是图11(B)的池化层和卷积层的一部分示意图;Figure 12 (B) is a partial schematic view of the pooling layer and the convolution layer of Figure 11 (B);
图13(A)是优化后的神经网络子模型的全连接层及其前一隐含层的示意图; Figure 13 (A) is a schematic diagram of the fully connected layer of the optimized neural network sub-model and its previous hidden layer;
图13(B)是初始化后的普通卷积神经网络模型的全连接层及其前一隐含层的示意图;Figure 13 (B) is a schematic diagram of the fully connected layer of the normal convolutional neural network model after initialization and its previous hidden layer;
图14(A)是图13(A)的全连接层及其前一隐含层的一部分示意图;Figure 14 (A) is a partial schematic view of the fully connected layer of Figure 13 (A) and its previous hidden layer;
图14(B)是图13(B)的全连接层及其前一隐含层的一部分示意图;Figure 14 (B) is a partial schematic view of the fully connected layer of Figure 13 (B) and its previous hidden layer;
图15是实施例1对神经网络模型中的各权值进行初始化的方法的另一个示意图;15 is another schematic diagram of a method for initializing weights in a neural network model in Embodiment 1;
图16是实施例2的对神经网络模型进行训练的装置的一个示意图;16 is a schematic diagram of an apparatus for training a neural network model of Embodiment 2;
图17是实施例2的提取单元的一个示意图;Figure 17 is a schematic illustration of the extraction unit of Embodiment 2;
图18是实施例2的初始化单元的一个示意图;Figure 18 is a schematic diagram of an initialization unit of Embodiment 2;
图19是实施例2的初始化单元的另一个示意图;Figure 19 is another schematic diagram of the initialization unit of Embodiment 2;
图20是实施例3的电子设备2000的系统构成的一示意框图。20 is a schematic block diagram showing the system configuration of the electronic device 2000 of the third embodiment.
具体实施方式detailed description
参照附图,通过下面的说明书,本发明的前述以及其它特征将变得明显。在说明书和附图中,具体公开了本发明的特定实施方式,其表明了其中可以采用本发明的原则的部分实施方式,应了解的是,本发明不限于所描述的实施方式,相反,本发明包括落入所附权利要求的范围内的全部修改、变型以及等同物。下面结合附图对本发明的各种实施方式进行说明。这些实施方式只是示例性的,不是对本发明的限制。The foregoing and other features of the present invention will be apparent from the The specific embodiments of the present invention are disclosed in the specification and the drawings, which are illustrated in the embodiment of the invention The invention includes all modifications, variations and equivalents falling within the scope of the appended claims. Various embodiments of the present invention will be described below with reference to the accompanying drawings. These embodiments are merely exemplary and are not limiting of the invention.
实施例1Example 1
本申请实施1提供一种对神经网络模型进行训练的方法,该方法用于确定神经网络模型中的各权值。 Embodiment 1 of the present application provides a method of training a neural network model for determining weights in a neural network model.
图2是本实施例的神经网络模型的一个示意图,如图2所示,该神经网络模型200包括输入层201、卷积层202、池化层203、松弛卷积层204、全连接层205、以及输出层206,其中,卷积层202、池化层203、松弛卷积层204以及全连接层205均为隐含层。2 is a schematic diagram of a neural network model of the present embodiment. As shown in FIG. 2, the neural network model 200 includes an input layer 201, a convolution layer 202, a pooling layer 203, a relaxed convolution layer 204, and a fully connected layer 205. And the output layer 206, wherein the convolution layer 202, the pooling layer 203, the relaxed convolution layer 204, and the fully connected layer 205 are all hidden layers.
在本实施例中,输入层201可以输入待识别数据2011;卷积层202、池化层203、松弛卷积层204、全连接层205和输出层206中的每一层都接收上一层所输出的数据,并采用与本层相对应的权值对数据进行处理,以分别生成本层所输出的数据并从本层的神经元节点(neuron)输出,各层的神经元节点分别为2021-2024、2031-2034、 2041-2046、2051-2058和2061-20610,并且,根据输出层206的神经元节点所输出的数据,可以确定待识别数据2011属于每一个类别206a的概率;此外,在图2中,仅标记出了神经元节点2021、2024、2031、2034、2041、2046、2051、2058、2061以及20610,并没有标记出其它神经元节点。In this embodiment, the input layer 201 can input the data to be identified 2011; each of the convolutional layer 202, the pooling layer 203, the relaxed convolutional layer 204, the fully connected layer 205, and the output layer 206 receives the upper layer. The output data is processed by the weight corresponding to the layer to generate the data output by the layer and output from the neuron node (neuron) of the layer, and the neuron nodes of each layer are respectively 2021-2024, 2031-2034, 2041-2046, 2051-2058, and 2061-20610, and, based on the data output by the neuron node of the output layer 206, the probability that the data to be identified 2011 belongs to each category 206a can be determined; further, in FIG. 2, only the marker Neuron nodes 2021, 2024, 2031, 2034, 2041, 2046, 2051, 2058, 2061, and 20610 are excluded, and other neuron nodes are not marked.
如图2所示,在使用神经网络模型200来进行手写数字图像的识别时,输入层201输入的待识别数据2011可以是手写数字图像,卷积层202、池化层203和松弛卷积层204的神经元节点所输出的数据可以是特征图(feature map),全连接层205和输出层206的神经元节点所输出的数据可以是数值。数字0-9中的一个数字可以与一个类别206a对应。因此,根据输出层206所输出的数据,可以确定待识别数据2011属于0-9中每一个数字的概率。As shown in FIG. 2, when the neural network model 200 is used to identify the handwritten digital image, the data to be identified 2011 input by the input layer 201 may be a handwritten digital image, a convolution layer 202, a pooling layer 203, and a slack convolution layer. The data output by the neuron node of 204 may be a feature map, and the data output by the neuron nodes of the fully connected layer 205 and the output layer 206 may be numerical values. A number in the numbers 0-9 may correspond to a category 206a. Therefore, based on the data output by the output layer 206, the probability that the data to be identified 2011 belongs to each of 0-9 can be determined.
在神经网络模型200中,每一层所对应的权值选取得合适,才能保证输出层206所输出的分类结果准确,其中,每一层所对应的权值可以是m*n的矩阵,m和n均为自然数。In the neural network model 200, the weights corresponding to each layer are selected to ensure that the classification result output by the output layer 206 is accurate, wherein the weight corresponding to each layer may be a matrix of m*n, m And n are both natural numbers.
本实施例的对神经网络模型进行训练的方法,就是用于确定神经网络模型中各层所对应的权值。The method for training the neural network model in this embodiment is for determining the weight corresponding to each layer in the neural network model.
在本实施例的下述说明中,将以图2所示的神经网络模型200为例来说明本实施例的对神经网络模型进行训练的方法。In the following description of the present embodiment, a method of training the neural network model of the present embodiment will be described by taking the neural network model 200 shown in FIG. 2 as an example.
需要说明的是,在本实施例中,神经网络模型200具有松弛卷积层204,因此,该神经网络模型200属于卷积神经网络模型,当然,本实施例的神经网络模型200也可以不具有松弛卷积层204,本实施例对此并不做限定;并且,本实施例所描述的对神经网络模型进行训练的方法不仅适用于卷积神经网络模型,也适用于其它的神经网络模型。It should be noted that, in this embodiment, the neural network model 200 has a relaxed convolution layer 204. Therefore, the neural network model 200 belongs to a convolutional neural network model. Of course, the neural network model 200 of the present embodiment may not have The relaxation convolution layer 204 is not limited in this embodiment; and the method for training the neural network model described in this embodiment is applicable not only to the convolutional neural network model but also to other neural network models.
图3是本实施例的对神经网络模型进行训练的方法的一个示意图,如图3所示,该方法包括:FIG. 3 is a schematic diagram of a method for training a neural network model according to the embodiment. As shown in FIG. 3, the method includes:
S301、提取神经网络模型的一部分,以形成神经网络子模型;S301. Extract a part of a neural network model to form a neural network sub-model;
S302、对所述神经网络子模型进行训练,以形成优化的神经网络子模型;S302. Train the neural network submodel to form an optimized neural network submodel.
S303、根据所述优化的神经网络子模型中的各权值,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型,并且,所述初始化神经网络模型与所述优化的神经网络子模型具有相同的输出特性; S303. Initialize each weight in the neural network model according to each weight in the optimized neural network submodel to form an initialization neural network model, and initialize the neural network model and the optimized neural network. The network submodel has the same output characteristics;
S304、基于已知训练集,对所述初始化神经网络模型中的各权值进行调整。S304. Adjust, according to the known training set, the weights in the initialized neural network model.
根据本申请的实施例,通过对规模较小的神经网络子模型进行训练,由训练后的该子模型对规模较大的神经网络模型进行初始化,进而对规模较大的神经网络模型进行微调,由此,与直接对规模较大的神经网络模型进行训练的方法相比,本实施例的方法能够避免过拟合和训练时间过长等问题。According to an embodiment of the present application, by training a small-scale neural network sub-model, the trained sub-model is used to initialize a large-scale neural network model, and then fine-tuning a large-scale neural network model. Thus, the method of the present embodiment can avoid problems such as overfitting and excessive training time as compared with a method of directly training a large-scale neural network model.
图4是本实施例的提取神经网络模型的一部分的一个方法的示意图,如图4所示,该方法包括:4 is a schematic diagram of a method of extracting a part of a neural network model of the embodiment, as shown in FIG. 4, the method includes:
S401、将所述神经网络模型中的松弛卷积层转化为普通卷积层,以将所述神经网络模型转化为普通卷积神经网络模型;以及S401. Convert a relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model;
S402、提取所述普通卷积神经网络模型的每一个隐含层中的部分神经元节点,以形成所述神经网络子模型。S402. Extract a partial neuron node in each hidden layer of the common convolutional neural network model to form the neural network sub-model.
在步骤S401中,将图2的神经网络模型200中的松弛卷积层204转化为普通卷积层,从而将神经网络模型200从卷积神经网络模型转化为普通卷积神经网络模型。In step S401, the relaxed convolutional layer 204 in the neural network model 200 of FIG. 2 is converted into a normal convolutional layer, thereby transforming the neural network model 200 from a convolutional neural network model to a general convolutional neural network model.
图5是该普通卷积神经网络模型500的一个示意图,其中,松弛卷积层204被转化为普通卷积层504,普通卷积神经网络模型500的其他层与神经网络模型200相同。5 is a schematic diagram of the conventional convolutional neural network model 500 in which the relaxed convolutional layer 204 is transformed into a common convolutional layer 504, and the other layers of the conventional convolutional neural network model 500 are identical to the neural network model 200.
普通卷积层504的数据处理方式与松弛卷积层204的数据处理方式的不同之处在于,在普通卷积层504中,参与卷积操作的同一个神经元节点内的不同位置的数据共享一个权值,而在松弛卷积层204中,参与卷积操作的同一个神经元节点内的不同位置的数据不共享任何一个权值。The data processing manner of the ordinary convolutional layer 504 is different from the data processing manner of the slack convolutional layer 204 in that, in the ordinary convolutional layer 504, data sharing at different locations within the same neuron node participating in the convolution operation is performed. A weight, and in the relaxed convolutional layer 204, data at different locations within the same neuron node participating in the convolution operation does not share any of the weights.
图6是本实施例的松弛卷积层204的处理方式的一个示意图,如图6所示,P1和P2是参与卷积操作的不同的神经元节点,P11、P14是P1中的不同位置的数据,P21、P24是P2中的不同位置的数据,W11、W14、W21、W24是不同的权值,T11、T14是卷积操作后生成的数据,其中,T11和T14的计算方式如下式(1)、(2)所示:6 is a schematic diagram of the processing manner of the slack convolution layer 204 of the present embodiment. As shown in FIG. 6, P1 and P2 are different neuron nodes participating in the convolution operation, and P11 and P14 are different positions in P1. Data, P21, P24 are data of different positions in P2, W11, W14, W21, W24 are different weights, and T11 and T14 are data generated after convolution operation, wherein T11 and T14 are calculated as follows ( 1), (2):
Figure PCTCN2016077975-appb-000001
Figure PCTCN2016077975-appb-000001
Figure PCTCN2016077975-appb-000002
Figure PCTCN2016077975-appb-000002
图6和式(1)、(2)中仅示出了T11、T14的计算方式,T12、T13的计算方式与之类似,本实施例不再详细说明。In Figure 6 and equations (1) and (2), only the calculation methods of T11 and T14 are shown. The calculation methods of T12 and T13 are similar, and this embodiment will not be described in detail.
如图6和式(1)、(2)所示,神经元节点P1中的数据P11和P14分别对应独立的权值W11、W14,神经元节点P2中的数据P21和P24分别对应独立的权值W21、 W24,也就是说,同一个神经元节点内的不同位置的数据不共享任何一个权值。As shown in Fig. 6 and equations (1) and (2), the data P11 and P14 in the neuron node P1 correspond to independent weights W11 and W14, respectively, and the data P21 and P24 in the neuron node P2 respectively correspond to independent rights. Value W21, W24, that is, data at different locations within the same neuron node does not share any weight.
图7是本实施例的普通卷积层504的处理方式的一个示意图,P1、P2、P11、P14、P21、P24、T11和T14的含义与图6相同,W1和W2是不同的权值,其中,T11和T14的计算方式如下式(3)、(4)所示:7 is a schematic diagram of the processing manner of the conventional convolution layer 504 of the present embodiment, and the meanings of P1, P2, P11, P14, P21, P24, T11, and T14 are the same as those of FIG. 6, and W1 and W2 are different weights. Among them, T11 and T14 are calculated as shown in the following equations (3) and (4):
Figure PCTCN2016077975-appb-000003
Figure PCTCN2016077975-appb-000003
Figure PCTCN2016077975-appb-000004
Figure PCTCN2016077975-appb-000004
如图7和式(3)、(4)所示,神经元节点P1中的不同位置的数据P11和P14共享权值W1,神经元节点P2中的不同位置的数据P21和P24共享权值W2。As shown in FIG. 7 and equations (3) and (4), the data P11 and P14 at different positions in the neuron node P1 share the weight W1, and the data P21 and P24 at different positions in the neuron node P2 share the weight W2. .
在本实施例的步骤S401中,神经网络模型200的松弛卷积层204中的部分权值可以被删除,以减少权值的数量,由此,使得参与卷积操作的同一个神经元节点内的数据共享一个权值,从而将松弛卷积层204转化为普通卷积层504,以将神经网络模型200转化为普通卷积神经网络模型500。In step S401 of the present embodiment, part of the weights in the relaxed convolution layer 204 of the neural network model 200 may be deleted to reduce the number of weights, thereby causing the same neuron node to participate in the convolution operation The data shares a weight, thereby transforming the relaxed convolutional layer 204 into a normal convolutional layer 504 to convert the neural network model 200 into a normal convolutional neural network model 500.
在本实施例的步骤S402中,普通卷积神经网络模型500的每一个隐含层的神经元节点可以按照一定的比例被删除,从而得到神经网络子模型,其中,每一个隐含层被删除的神经元节点的比例可以相同,也可以不同。In step S402 of the present embodiment, the neuron nodes of each hidden layer of the ordinary convolutional neural network model 500 can be deleted according to a certain ratio, thereby obtaining a neural network sub-model, in which each hidden layer is deleted. The proportion of neuron nodes can be the same or different.
图8是本实施例的神经网络子模型800的一个示意图,如图8所示,在图5的普通卷积神经网络模型500的基础上,每一个隐含层的神经元被删除的比例均为50%,从而形成了神经网络子模型800,其中,图8的801为从普通卷积神经网络模型500中删除的神经元节点,图8的输入层201与输出层206分别与图5的输入层201与输出层206相同,图8的卷积层802、池化层803、卷积层804以及全连接层805与图5的卷积层202、池化层203、普通卷积层504以及全连接层205分别对应。FIG. 8 is a schematic diagram of the neural network sub-model 800 of the present embodiment. As shown in FIG. 8, on the basis of the general convolutional neural network model 500 of FIG. 5, the proportion of neurons in each hidden layer is deleted. 50%, thereby forming a neural network sub-model 800, wherein 801 of FIG. 8 is a neuron node deleted from the ordinary convolutional neural network model 500, and the input layer 201 and the output layer 206 of FIG. 8 are respectively associated with FIG. The input layer 201 is the same as the output layer 206, and the convolutional layer 802, the pooling layer 803, the convolutional layer 804, and the fully connected layer 805 of FIG. 8 and the convolutional layer 202, the pooling layer 203, and the ordinary convolutional layer 504 of FIG. And the fully connected layers 205 correspond to each.
在本实施例的步骤S401和步骤S402中,先将神经网络模型200转化为普通卷积神经网络模型,然后对普通卷积神经网络模型进行神经元节点的删除,以得到神经网络子模型800,其中,将神经网络模型200转化为普通卷积神经网络模型的目的在于,能够使随后生成的神经网络子模型800中的权值的个数减少,避免过拟合。但是,本实施例并不限于此,如果神经网络模型200中不具有松弛卷积层204,则可以直接对神经网络模型200进行神经元节点的删除处理。In step S401 and step S402 of the embodiment, the neural network model 200 is first converted into a common convolutional neural network model, and then the neuron node is deleted from the common convolutional neural network model to obtain the neural network sub-model 800. Among them, the purpose of transforming the neural network model 200 into a common convolutional neural network model is to reduce the number of weights in the subsequently generated neural network sub-model 800 and avoid over-fitting. However, the present embodiment is not limited thereto. If the neural network model 200 does not have the slack convolution layer 204, the neuron node 200 can be directly deleted by the neural network model 200.
在本实施例的步骤S302中,可以根据已知的训练集,对神经网络子模型800进行训练,以确定其中的各权值的优化值,从而将神经网络子模型800训练为优化的神 经网络子模型。其中,对神经网络子模型800进行训练的方法可以参考现有技术,本实施例不再赘述。In step S302 of the embodiment, the neural network sub-model 800 can be trained according to a known training set to determine an optimized value of each weight thereof, thereby training the neural network sub-model 800 as an optimized god. Through the network sub-model. The method for training the neural network sub-model 800 can refer to the prior art, and details are not described in this embodiment.
在本实施例的步骤S303中,优化的神经网络子模型可以被用来对神经网络模型200中的各权值进行初始化,并且,初始化后的神经网络模型200具有与优化的神经网络子模型相同的输出特性,由此,使得初始化过程不会引入新的误差。In step S303 of the present embodiment, the optimized neural network submodel can be used to initialize the weights in the neural network model 200, and the initialized neural network model 200 has the same neural network submodel as the optimized neural network. The output characteristics, whereby the initialization process does not introduce new errors.
图9是本实施例对神经网络模型中的各权值进行初始化的方法的一个示意图,用于实现步骤S303。如图9所示,该方法包括:FIG. 9 is a schematic diagram of a method for initializing each weight in the neural network model according to the embodiment, for implementing step S303. As shown in FIG. 9, the method includes:
S901、根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;S901: Initialize, according to each weight in the optimized neural network submodel, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;
S902、将所述初始化普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。S902. Convert a common convolution layer in the initialized general convolutional neural network model into a relaxation convolution layer to form the initialization neural network model.
在本实施例的步骤S901中,可以根据该优化的神经网络子模型中的各隐含层的权值,对该普通卷积神经网络模型500中对应的隐含层的权值进行初始化,以形成初始化普通卷积神经网络模型,其中,该初始化普通卷积神经网络模型的各隐含层的输出特性与该优化的神经网络子模型的各隐含层的输出特性相同,例如,可以将该优化的神经网络子模型中的隐含层的各权值乘以预定的系数,作为普通卷积神经网络模型中的对应的隐含层的各权值。In step S901 of the embodiment, the weights of the corresponding hidden layers in the common convolutional neural network model 500 may be initialized according to the weights of the hidden layers in the optimized neural network submodel. Forming an initial general convolutional neural network model, wherein an output characteristic of each hidden layer of the initialized general convolutional neural network model is the same as an output characteristic of each hidden layer of the optimized neural network submodel, for example, The weights of the hidden layers in the optimized neural network submodel are multiplied by a predetermined coefficient as the weights of the corresponding hidden layers in the general convolutional neural network model.
下面,以图5的普通卷积神经网络模型500为例,说明对各权值进行初始化的过程。Next, the normal convolutional neural network model 500 of FIG. 5 will be taken as an example to describe the process of initializing the weights.
1、对于作为第一隐含层的卷积层202的各权值初始化1. Initializing the weights of the convolution layer 202 as the first hidden layer
在图5中,卷积层202与输入层201连接,该卷积层202是输入层201之后的第一隐含层。卷积层202的输入数据是输入层201的待识别数据,由该待识别数据与卷积层202的权值进行卷积得到卷积层202各神经元节点的输出数据。In FIG. 5, convolutional layer 202 is coupled to input layer 201, which is the first hidden layer after input layer 201. The input data of the convolutional layer 202 is the data to be identified of the input layer 201, and the data to be identified is convoluted with the weight of the convolutional layer 202 to obtain the output data of each neuron node of the convolutional layer 202.
图10(A)是优化后的神经网络子模型800的输入层201和卷积层802的示意图,图10(B)是初始化后的普通卷积神经网络模型500的输入层201和卷积层202的示意图。10(A) is a schematic diagram of the input layer 201 and the convolution layer 802 of the optimized neural network sub-model 800, and FIG. 10(B) is the input layer 201 and the convolution layer of the initialized normal convolutional neural network model 500. A schematic diagram of 202.
如图10(A)所示,在神经网络子模型800中,输入数据与权值K1卷积得到神经元节点8021输出的特征图A1,输入数据与权值K2卷积得到特征图A2。如图10(B)所示,将权值K1乘以预定的系数L11,作为普通卷积神经网络子模型500卷 积层202的神经元节点2021、2023所对应的权值,将权值K2乘以预定的系数L12,作为普通卷积神经网络子模型500卷积层202的神经元节点2022、2024所对应的权值,从而对普通卷积神经网络模型500的卷积层202的各权值进行初始化。As shown in FIG. 10(A), in the neural network submodel 800, the input data is convoluted with the weight K1 to obtain the feature map A1 output by the neuron node 8021, and the input data is convoluted with the weight K2 to obtain the feature map A2. As shown in FIG. 10(B), the weight K1 is multiplied by a predetermined coefficient L11 as a common convolutional neural network submodel 500 volume. The weights corresponding to the neuron nodes 2021, 2023 of the layer 202 multiply the weight K2 by a predetermined coefficient L12, which corresponds to the neuron nodes 2022, 2024 of the convolutional layer 202 of the common convolutional neural network submodel 500. The weights are thus initialized for each weight of the convolutional layer 202 of the normal convolutional neural network model 500.
在本实施例中,预定的系数L11、L12可以均为1,因此,普通卷积神经网络模型500的卷积层202的神经元节点2021、2022、2023、2024输出的特征图分别为A1、A2、A3、A4,其中,A1=A3,A2=A4,由此,初始化后的普通卷积神经网络模型500的卷积层202与优化后的神经网络子模型800的卷积层802具有相同的输出特性。In this embodiment, the predetermined coefficients L11 and L12 may both be 1, and therefore, the feature maps output by the neuron nodes 2021, 2022, 2023, and 2024 of the convolutional layer 202 of the common convolutional neural network model 500 are respectively A1. A2, A3, A4, where A1 = A3, A2 = A4, whereby the convolutional layer 202 of the normalized convolutional neural network model 500 after initialization has the same convolutional layer 802 as the optimized neural network submodel 800 Output characteristics.
当然,在本实施例中,L11、L12也可以具有别的值,并且可以彼此不同。Of course, in the present embodiment, L11, L12 may have other values, and may be different from each other.
2、对于作为非第一隐含层的卷积层504的各权值初始化2. Initializing the weights of the convolutional layer 504 as a non-first hidden layer
在图5中,卷积层202的各神经元节点所输出的特征图作为池化层203的输入数据,池化层203所输出的特征图作为卷积层504的输入数据,该卷积层504为非第一隐含层。In FIG. 5, the feature map output by each neuron node of the convolution layer 202 is used as input data of the pooling layer 203, and the feature map output by the pooling layer 203 is used as input data of the convolution layer 504, and the convolution layer 504 is a non-first hidden layer.
图11(A)是优化后的神经网络子模型800的池化层803和卷积层804的示意图,图11(B)是初始化后的普通卷积神经网络模型500的池化层203和卷积层504的示意图。11(A) is a schematic diagram of the pooled layer 803 and the convolution layer 804 of the optimized neural network submodel 800, and FIG. 11(B) is the pooled layer 203 and volume of the initialized normal convolutional neural network model 500. A schematic of the buildup 504.
如图11(A)所示,池化层803的各神经元节点所输出的特征图B1、B2用于生成卷积层804的各神经元节点的特征图C1-C3。其中,在池化层803中,分别将图10(A)的A1、A2与相应的权值进行池化处理,得到B1、B2。As shown in FIG. 11(A), the feature maps B1, B2 outputted by the respective neuron nodes of the pooling layer 803 are used to generate feature maps C1-C3 of the respective neuron nodes of the convolutional layer 804. In the pooling layer 803, A1 and A2 of FIG. 10(A) and the corresponding weights are respectively pooled to obtain B1 and B2.
如图11(B)所示,池化层203的各神经元节点所输出的特征图B1-B4用于生成卷积层504的各神经元节点的特征图C1’-C6’。其中,在池化层203中,分别将图10(B)的A1-A4与相应的权值进行池化处理,得到B1-B4,并且,池化层203中的各权值可以是池化层803中的各权值乘以预定的系数得到的,该预定的系数例如可以是1,因此,在图11(B)中,B1=B3,B2=B4,由此,初始化后的普通卷积神经网络模型500的池化层203与优化后的神经网络子模型800的池化层803具有相同的输出特性。As shown in Fig. 11(B), the feature maps B1-B4 outputted by the respective neuron nodes of the pooling layer 203 are used to generate feature maps C1'-C6' of the respective neuron nodes of the convolutional layer 504. In the pooling layer 203, A1-A4 of FIG. 10(B) and the corresponding weights are respectively pooled to obtain B1-B4, and each weight in the pooling layer 203 may be pooled. The weights in the layer 803 are multiplied by a predetermined coefficient, which may be, for example, 1, therefore, in Fig. 11(B), B1 = B3, B2 = B4, whereby the normal volume after initialization The pooling layer 203 of the product neural network model 500 has the same output characteristics as the pooling layer 803 of the optimized neural network submodel 800.
图12(A)是图11(A)的池化层803和卷积层804的一部分示意图,图12(B)是图11(B)的池化层203和卷积层504的一部分示意图,在图12(A)和12(B)中,示出了相应的权值。12(A) is a partial schematic view of the pooling layer 803 and the convolution layer 804 of FIG. 11(A), and FIG. 12(B) is a partial schematic view of the pooling layer 203 and the convolution layer 504 of FIG. 11(B). In Figures 12(A) and 12(B), the corresponding weights are shown.
如图12(A)所示,在神经网络子模型800中,B1、B2与权值K3、K4进行卷 积,得到神经元节点8041输出的特征图C1,该卷积如下式(5)所示:As shown in FIG. 12(A), in the neural network submodel 800, B1, B2 and weights K3, K4 are rolled. Product, the feature map C1 output by the neuron node 8041 is obtained, and the convolution is as shown in the following equation (5):
Figure PCTCN2016077975-appb-000005
Figure PCTCN2016077975-appb-000005
如图12(B)所示,将权值K3乘以预定的系数L21,得到K3’,作为普通卷积神经网络子模型500的卷积层504中与特征图B1、B3所对应的权值,将权值K4乘以预定的系数L22,得到K4’,作为普通卷积神经网络子模型500的卷积层504中与特征图B2、B4所对应的权值,从而对普通卷积神经网络模型500的卷积层504的各权值进行初始化。As shown in FIG. 12(B), the weight K3 is multiplied by a predetermined coefficient L21 to obtain K3' as the weight corresponding to the feature maps B1, B3 in the convolutional layer 504 of the ordinary convolutional neural network submodel 500. Multiplying the weight K4 by a predetermined coefficient L22 to obtain K4' as the weight corresponding to the feature maps B2 and B4 in the convolutional layer 504 of the ordinary convolutional neural network submodel 500, thereby the common convolutional neural network The weights of the convolutional layer 504 of the model 500 are initialized.
在本实施例中,预定的系数L21、L22可以均为1/2,因此,K3’=K3*1/2,K4’=K4*1/2,由此,神经元节点5041所输出的特征图C1’可以如下式(6)所示:In the present embodiment, the predetermined coefficients L21, L22 may both be 1/2, and therefore, K3' = K3 * 1/2, K4' = K4 * 1/2, whereby the features output by the neuron node 5041 Figure C1' can be expressed as shown in the following equation (6):
Figure PCTCN2016077975-appb-000006
Figure PCTCN2016077975-appb-000006
可见,C1’=C1,由此,初始化后的普通卷积神经网络模型500的卷积层504与优化后的神经网络子模型800的卷积层804具有相同的输出特性。It can be seen that C1' = C1, whereby the convolutional layer 504 of the normalized convolutional neural network model 500 after initialization has the same output characteristics as the convolutional layer 804 of the optimized neural network submodel 800.
当然,在本实施例中,L21、L22也可以具有别的值,并且可以彼此不同。Of course, in the present embodiment, L21, L22 may have other values, and may be different from each other.
在本实施例中,可以采用与上述类似的方法,使用图11(A)的B1、B2与C2之间的权值来对图11(B)的B1-B4与C2’之间的权值进行初始化,使用图11(A)的B1、B2与C3之间的权值来对图11(B)的B1-B4与C3’之间的权值进行初始化,由此,C2’=C2,C3’=C3。In the present embodiment, the weight between B1, B4, and C2' of FIG. 11(B) can be used in a similar manner to the above, using the weights between B1, B2, and C2 of FIG. 11(A). Initialization is performed to initialize the weight between B1-B4 and C3' of FIG. 11(B) using the weights between B1, B2, and C3 of FIG. 11(A), whereby C2'=C2, C3'=C3.
在本实施例中,可以采用与上述类似的方法,使用图11(A)的B1、B2与C1之间的权值来对图11(B)的B1-B4与C4’之间的权值进行初始化,如图12(B)所示,可以根据K3、K4来生成K3’、K4’,从而能够根据B1-B4与K3’、K4’生成神经元节点5044所输出的特征图C4’,并且C1’=C4’。同样地,还可以使用图11(A)的B1、B2与C2之间的权值来对图11(B)的B1-B4与C5’之间的权值进行初始化,使用图11(A)的B1、B2与C3之间的权值来对图11(B)的B1-B4与C6’之间的权值进行初始化,由此,C2’=C5’,C3’=C6’。In the present embodiment, the weight between B1, B4 and C4' of Fig. 11(B) can be used in a similar manner to the above, using the weights between B1, B2 and C1 of Fig. 11(A). Initialization, as shown in FIG. 12(B), K3', K4' can be generated according to K3, K4, so that the feature map C4' output by the neuron node 5044 can be generated from B1-B4 and K3', K4', And C1'=C4'. Similarly, the weight between B1, B4, and C5' of FIG. 11(B) can be initialized using the weights between B1, B2, and C2 of FIG. 11(A), using FIG. 11(A). The weight between B1, B2 and C3 is initialized to the weight between B1-B4 and C6' of Fig. 11(B), whereby C2'=C5', C3'=C6'.
在本实施例中,如果在卷积层504之后还有其他的卷积层,则该其他的卷积层中各权值的初始化方法与对卷积层504中各权值的初始化方法相似。In the present embodiment, if there are other convolution layers after the convolutional layer 504, the initialization methods of the weights in the other convolutional layers are similar to the initialization methods for the weights in the convolutional layer 504.
3、对于全连接层205的各权值初始化 3. Initialize the weights of the fully connected layer 205.
在图5中,全连接层205可以位于全部的卷积层之后,并且,全连接层205的后面可以连接输出层206,即,该全连接层205为最后隐含层,或者,全连接层205的后面可以连接有其他的全连接层,即,该全连接层205为非最后隐含层。In FIG. 5, the fully connected layer 205 may be located after all the convolutional layers, and the output layer 206 may be connected to the rear of the fully connected layer 205, that is, the fully connected layer 205 is the last hidden layer, or the fully connected layer. Other fully connected layers may be connected to the rear of 205, i.e., the fully connected layer 205 is a non-final hidden layer.
当全连接层205为非最后隐含层的情况下,全连接层205的各权值的初始化方法可以参考位于第一隐含层之后的卷积层504的初始化方法,并且,在全连接层205中,用乘法操作代替卷积操作,这是因为,全连接层205可以看成是权值为1×1的卷积层。When the fully-connected layer 205 is a non-last hidden layer, the initialization method of each weight of the fully-connected layer 205 may refer to the initialization method of the convolution layer 504 located after the first hidden layer, and at the fully-connected layer. In 205, the convolution operation is replaced by a multiplication operation because the fully-connected layer 205 can be regarded as a convolutional layer having a weight of 1 × 1.
下面,说明当全连接层205为最后隐含层的情况下,对全连接层205的各权值的初始化方法。Next, a method of initializing the weights of the all-connection layer 205 when the fully-connected layer 205 is the last hidden layer will be described.
作为最后隐含层的全连接层205具有与输出层206的类别数一样多的神经元节点。所以,对于优化后的神经网络子模型800的全连接层805和初始化后的普通卷积神经网络模型500的全连接层205,二者的神经元节点的数量相同,但是,二者的输入数据的数量可以不同。The fully connected layer 205, which is the last hidden layer, has as many neuron nodes as the number of classes of the output layer 206. Therefore, for the fully connected layer 805 of the optimized neural network submodel 800 and the fully connected layer 205 of the initialized normal convolutional neural network model 500, the number of neuron nodes is the same, but the input data of the two The number can vary.
图13(A)是优化后的神经网络子模型800的全连接层805及其前一隐含层的示意图,图13(B)是初始化后的普通卷积神经网络模型500的全连接层205及其前一隐含层的示意图。13(A) is a schematic diagram of the fully connected layer 805 of the optimized neural network submodel 800 and its previous hidden layer, and FIG. 13(B) is the fully connected layer 205 of the initialized normal convolutional neural network model 500. And a schematic diagram of the previous hidden layer.
如图13(A)所示,前一隐含层各神经元节点所输出的数据F1、F2用于生成全连接层805的各神经元节点的输出数据E1-E3。如图13(B)所示,前一隐含层的各神经元节点所输出的数据F1-F4用于生成全连接层205的各神经元节点的输出数据E1’-E3’。其中,数据F1-F4可以是浮点数的形式。As shown in FIG. 13(A), the data F1, F2 outputted by the neuron nodes of the previous hidden layer are used to generate the output data E1-E3 of the respective neuron nodes of the fully connected layer 805. As shown in Fig. 13(B), the data F1-F4 outputted by the respective neuron nodes of the previous hidden layer are used to generate the output data E1'-E3' of the respective neuron nodes of the fully connected layer 205. Among them, the data F1-F4 may be in the form of a floating point number.
在图13(A)中,由于之前提取子模型的操作,其前一隐含层的神经元节点的个数比图13(B)的前一隐含层的个数减少了一半,所以,图13(A)中的前一隐含层输出的数据的个数也是图13(B)中的前一隐含层输出的数据的个数的一半。在图13(B)中,前一隐含层在经过初始化以后,可以满足例如下面的条件,F1=F3,F2=F4。In FIG. 13(A), the number of neuron nodes of the previous hidden layer is reduced by half compared with the number of the previous hidden layer of FIG. 13(B) due to the operation of the previous extraction submodel. The number of data outputted by the previous hidden layer in Fig. 13(A) is also half the number of data outputted by the previous hidden layer in Fig. 13(B). In Fig. 13(B), the previous hidden layer can satisfy, for example, the following conditions after initialization, F1 = F3, F2 = F4.
图14(A)是图13(A)的全连接层805及其前一隐含层的一部分示意图,图14(B)是图13(B)的全连接层205及其前一隐含层的一部分示意图,在图14(A)和14(B)中,示出了相应的权值。Figure 14 (A) is a partial schematic view of the fully-connected layer 805 of Figure 13 (A) and its previous hidden layer, Figure 14 (B) is the fully-connected layer 205 of Figure 13 (B) and its previous hidden layer A portion of the schematic diagram, in Figures 14(A) and 14(B), shows the corresponding weights.
如图14(A)所示,在神经网络子模型800中,F1、F2与权值K5、K6相乘,得到神经元节点8051输出的数据E1,该相乘如下式(7)所示: As shown in FIG. 14(A), in the neural network submodel 800, F1 and F2 are multiplied by weights K5 and K6 to obtain data E1 outputted by the neuron node 8051, and the multiplication is as shown in the following equation (7):
E1=F1*K5+F2*K6             (7)E1=F1*K5+F2*K6 (7)
如图14(B)所示,将权值K5乘以预定的系数L31,得到K5’,作为普通卷积神经网络模型500的全连接层205中与F1、F3所对应的权值,将权值K6乘以预定的系数L32,得到K5’,作为普通卷积神经网络模型500的全连接层205中与F2、F4所对应的权值,从而对普通卷积神经网络模型500的全连接层205的各权值进行初始化。As shown in FIG. 14(B), the weight K5 is multiplied by a predetermined coefficient L31 to obtain K5', which is the weight corresponding to F1 and F3 in the fully connected layer 205 of the ordinary convolutional neural network model 500. The value K6 is multiplied by a predetermined coefficient L32 to obtain K5' as the weight corresponding to F2 and F4 in the fully connected layer 205 of the ordinary convolutional neural network model 500, thereby the fully connected layer of the ordinary convolutional neural network model 500. The weights of 205 are initialized.
在本实施例中,预定的系数L31、L32可以均为1/2,因此,K5’=K5*1/2,K6’=K6*1/2,由此,神经元节点2051所输出的数据E1’可以如下式(8)所示:In the present embodiment, the predetermined coefficients L31 and L32 may both be 1/2, and therefore, K5'=K5*1/2, K6'=K6*1/2, whereby the data output by the neuron node 2051 E1' can be expressed as shown in the following equation (8):
Figure PCTCN2016077975-appb-000007
Figure PCTCN2016077975-appb-000007
可见,E1’=E1,由此,初始化后的普通卷积神经网络模型500的全连接层205与优化后的神经网络子模型800的全连接层805具有相同的输出特性。It can be seen that E1' = E1, whereby the fully connected layer 205 of the initialized normal convolutional neural network model 500 has the same output characteristics as the fully connected layer 805 of the optimized neural network submodel 800.
当然,在本实施例中,L31、L32也可以具有别的值,并且可以彼此不同。Of course, in the present embodiment, L31 and L32 may have other values, and may be different from each other.
在本实施例中,可以采用与上述类似的方法,使用图13(A)的F1、F2与E2之间的权值来对图13(B)的F1-F4与E2’之间的权值进行初始化,使用图13(A)的F1、F2与E3之间的权值来对图13(B)的F1-F4与E3’之间的权值进行初始化,由此,E2’=E2,E3’=E3。In the present embodiment, the weight between F1, F4 and E2' of Fig. 13(B) can be used in a similar manner to the above, using the weights between F1, F2 and E2 of Fig. 13(A). Initialization is performed to initialize the weight between F1-F4 and E3' of FIG. 13(B) using the weights between F1, F2, and E3 of FIG. 13(A), whereby E2'=E2, E3'=E3.
根据上述的1-3的操作,可以对图5所示的普通卷积神经网络模型500中的各权值进行初始化。当然,普通卷积神经网络模型500并不限于图5所示的结果,普通卷积神经网络模型500中还可以具有其他的隐含层,对该其他的隐含层的初始化方法,可以参考上述1-3的操作。According to the operations of 1-3 described above, the weights in the ordinary convolutional neural network model 500 shown in FIG. 5 can be initialized. Of course, the general convolutional neural network model 500 is not limited to the result shown in FIG. 5. The ordinary convolutional neural network model 500 may have other hidden layers. For the initialization method of the other hidden layers, reference may be made to the above. 1-3 operation.
在本实施例的步骤S902中,将普通卷积层转化为松弛卷积层的处理可以是步骤S401的逆向处理,即,在步骤S902中,可以将普通卷积层内的权值复制多份,使得参与卷积操作的同一个神经元节点内的不同位置的数据对应不同的权值,例如,将图7中的权值W1复制为W11、W14,将权值W2复制为W21、W24,由此,将普通卷积层转化为松弛卷积层。In step S902 of the embodiment, the process of converting the normal convolution layer into the slack convolution layer may be the reverse process of step S401, that is, in step S902, the weights in the ordinary convolution layer may be copied in multiple copies. The data of different positions in the same neuron node participating in the convolution operation are corresponding to different weights. For example, the weight W1 in FIG. 7 is copied into W11 and W14, and the weight W2 is copied into W21 and W24. Thereby, the ordinary convolution layer is converted into a relaxed convolution layer.
图15是本实施例对神经网络模型中的各权值进行初始化的方法的另一个示意图,用于实现步骤S303。如图15所示,该方法包括: FIG. 15 is another schematic diagram of a method for initializing each weight in the neural network model in the embodiment, for implementing step S303. As shown in Figure 15, the method includes:
S901、根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;S901: Initialize, according to each weight in the optimized neural network submodel, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;
S1501、基于已知训练集,对初始化普通卷积神经网络模型中的各权值进行调整,以形成调整后普通卷积神经网络模型;以及S1501: Adjusting, according to a known training set, each weight in an initial general convolutional neural network model to form an adjusted ordinary convolutional neural network model;
S1502、将所述调整后普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。S1502: Convert a common convolution layer in the adjusted ordinary convolutional neural network model into a relaxation convolution layer to form the initialization neural network model.
图15的方法与图9的方法的不同之处在于,在图15的方法中增加了步骤S1501,即,在得到初始化普通卷积神经网络模型之后,对普通卷积神经网络模型进行调整,由此,能够减轻后续在步骤S304中进行调整时的工作量。关于步骤S1501中进行调整的方法,可以参考现有技术中对神经网络模型进行调整的方法,本实施例不再描述。The method of FIG. 15 is different from the method of FIG. 9 in that step S1501 is added to the method of FIG. 15, that is, after the normal convolutional neural network model is initialized, the ordinary convolutional neural network model is adjusted by Thereby, the amount of work when the adjustment is performed in step S304 can be alleviated. For the method for performing the adjustment in step S1501, reference may be made to the method for adjusting the neural network model in the prior art, which is not described in this embodiment.
在本实施例中,步骤S1502中的将普通卷积层转化为松弛卷积层的处理方法可以与步骤S902的处理方法相同。In the present embodiment, the processing method of converting the ordinary convolution layer into the slack convolution layer in step S1502 may be the same as the processing method of step S902.
上述图9和图15所示的方法中,先在步骤S901中对普通卷积神经网络模型中的各权值进行初始化,再通过步骤S902或步骤S1501与S1502将普通卷积层转化为松弛卷积层,从而对神经网络模型200的各权值进行初始化;但是,本实施例并不限于此,如果神经网络模型200不具有松弛卷积层,则可以在步骤S901中直接对神经网络模型200的各权值进行初始化,而无需步骤S902或步骤S1501与S1502。In the method shown in FIG. 9 and FIG. 15, the weights in the normal convolutional neural network model are initialized in step S901, and the ordinary convolution layer is converted into a slack volume by step S902 or steps S1501 and S1502. The layers are layered to initialize the weights of the neural network model 200; however, the embodiment is not limited thereto, and if the neural network model 200 does not have a relaxed convolutional layer, the neural network model 200 may be directly used in step S901. The weights are initialized without step S902 or steps S1501 and S1502.
在本实施例的步骤S304中,可以基于已知的训练集,对初始化神经网络模型中的各权值进行调整,调整的方式可以参考现有技术,本实施例不再详细说明。In the step S304 of the embodiment, the weights in the initialization neural network model may be adjusted based on the known training set, and the manner of the adjustment may be referred to the prior art, which is not described in detail in this embodiment.
根据本实施例,训练小规模的神经网络模型,并由小规模的神经网络模型对大规模的神经网络模型进行初始化,最后对初始化后的大规模的神经网络模型进行微调,由于小规模的网络已经完成了大部分的训练工作,因此对大规模的网络只需要微调几轮即可收敛,由此,能够避免对大规模的神经网络进行直接训练所产生的过拟合和训练时间过长等问题。According to the present embodiment, a small-scale neural network model is trained, and a large-scale neural network model is initialized by a small-scale neural network model, and finally a large-scale neural network model after initialization is fine-tuned due to a small-scale network. Most of the training work has been completed, so large-scale networks only need to be fine-tuned for several rounds to converge, thereby avoiding over-fitting and excessive training time for direct training of large-scale neural networks. problem.
实施例2Example 2
实施例2提供一种对神经网络模型进行训练的装置,与实施例1的方法对应。 Embodiment 2 provides an apparatus for training a neural network model, which corresponds to the method of Embodiment 1.
图16是本实施例2的对神经网络模型进行训练的装置的一个示意图,如图16所示,该装置1600包括:提取单元1601,第一训练单元1602,初始化单元1603, 以及第二训练单元1604。16 is a schematic diagram of an apparatus for training a neural network model according to the second embodiment. As shown in FIG. 16, the apparatus 1600 includes: an extracting unit 1601, a first training unit 1602, and an initializing unit 1603. And a second training unit 1604.
其中,提取单元1601用于提取神经网络模型的一部分,以形成神经网络子模型;第一训练单元1602用于对所述神经网络子模型进行训练,以形成优化的神经网络子模型;初始化单元1603根据所述优化的神经网络子模型中的各权值,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型,并且,所述初始化神经网络模型与所述优化的神经网络子模型具有相同的输出特性;第二训练单元1604基于已知训练集,对所述初始化神经网络模型中的各权值进行调整。The extracting unit 1601 is configured to extract a part of the neural network model to form a neural network sub-model; the first training unit 1602 is configured to train the neural network sub-model to form an optimized neural network sub-model; and the initializing unit 1603 Initializing each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initializing the neural network model and the optimized neural network The models have the same output characteristics; the second training unit 1604 adjusts the weights in the initialized neural network model based on the known training set.
图17是本实施例2的提取单元1601的一个示意图,如图17所示,该提取单元1601包括第一转化单元1701和提取子单元1702。17 is a schematic diagram of the extracting unit 1601 of the second embodiment. As shown in FIG. 17, the extracting unit 1601 includes a first converting unit 1701 and an extracting subunit 1702.
其中,第一转化单元1701用于将所述神经网络模型中的松弛卷积层转化为普通卷积层,以将所述神经网络模型转化为普通卷积神经网络模型;提取子单元1702用于提取所述普通卷积神经网络模型的每一个隐含层中的部分神经元节点,以形成所述神经网络子模型。The first conversion unit 1701 is configured to convert the relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model; and the extraction subunit 1702 is used to A portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.
图18是本实施例2的初始化单元1603的一个示意图,如图18所示,该初始化单元1603包括第一初始化子单元1801和第二转化单元1802。FIG. 18 is a schematic diagram of the initialization unit 1603 of the second embodiment. As shown in FIG. 18, the initialization unit 1603 includes a first initialization subunit 1801 and a second conversion unit 1802.
其中,第一初始化子单元1801用于根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;第二转化单元1802用于将所述初始化普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。The first initialization subunit 1801 is configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model; The second transforming unit 1802 is configured to convert the normal convolutional layer in the initialized normal convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.
在本实施例中,第一初始化子单元1801可以根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络模型中对应的隐含层的权值进行初始化,以形成初始化普通卷积神经网络模型,其中,所述初始化普通卷积神经网络模型的各隐含层的输出特性与所述优化的神经网络子模型的各隐含层的输出特性相同,例如,第一初始化子单元1801可以将所述优化的神经网络子模型中的隐含层的各权值乘以预定的系数,作为所述普通卷积神经网络模型中的对应的隐含层的各权值。In this embodiment, the first initialization subunit 1801 may determine the weight of the corresponding hidden layer in the common convolutional neural network model according to the weight of each hidden layer in the optimized neural network submodel. Initializing to form an initialization common convolutional neural network model, wherein the output characteristics of the implicit layers of the initialized general convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network submodel For example, the first initialization subunit 1801 may multiply each weight of the hidden layer in the optimized neural network submodel by a predetermined coefficient as a corresponding hidden layer in the common convolutional neural network model. The weights of each.
在本实施例中,第二转化单元1802可以采用与第一转化单元1701相反的操作,将普通卷积神经网络模型转换为松弛卷积神经网络模型。In the present embodiment, the second converting unit 1802 may convert the ordinary convolutional neural network model into a relaxed convolutional neural network model by using an operation opposite to that of the first converting unit 1701.
图19是本实施例2的初始化单元1603的另一个示意图,如图19所示,该初始化单元1603包括:第二初始化子单元1901、第三训练单元1902、以及第三转化单元 1903。FIG. 19 is another schematic diagram of the initializing unit 1603 of the second embodiment. As shown in FIG. 19, the initializing unit 1603 includes: a second initializing subunit 1901, a third training unit 1902, and a third converting unit. 1903.
其中,第二初始化子单元1901根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;第三训练单元1902基于已知训练集,对初始化普通卷积神经网络模型中的各权值进行调整,以形成调整后普通卷积神经网络模型;第三转化单元1903用于将所述调整后普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。The second initialization subunit 1901 initializes each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initial common convolutional neural network model; The training unit 1902 adjusts each weight in the initialized normal convolutional neural network model to form an adjusted ordinary convolutional neural network model based on the known training set; the third transformation unit 1903 is configured to use the adjusted normal volume The ordinary convolutional layer in the neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
在本实施例中,第二初始化子单元1901的处理方式可以与第一初始化子单元1801的处理方式相同;第三训练单元1902对初始化普通卷积神经网络模型中的各权值进行调整的方式可以参考现有技术;第三转化单元1903将普通卷积层转化为松弛卷积层的方式可以参考第二转化单元1802。In this embodiment, the processing manner of the second initialization subunit 1901 may be the same as the processing manner of the first initialization subunit 1801; and the manner in which the third training unit 1902 adjusts the weights in the normal convolutional neural network model. Reference may be made to the prior art; the third conversion unit 1903 may refer to the second conversion unit 1802 by converting the ordinary convolution layer into a relaxed convolution layer.
根据本实施例,训练小规模的神经网络模型,并由小规模的神经网络模型对大规模的神经网络模型进行初始化,最后对初始化后的大规模的神经网络模型进行微调,由于小规模的网络已经完成了大部分的训练工作,因此对大规模的网络只需要微调几轮即可收敛,由此,能够避免对大规模的神经网络进行直接训练所产生的过拟合和训练时间过长等问题。According to the present embodiment, a small-scale neural network model is trained, and a large-scale neural network model is initialized by a small-scale neural network model, and finally a large-scale neural network model after initialization is fine-tuned due to a small-scale network. Most of the training work has been completed, so large-scale networks only need to be fine-tuned for several rounds to converge, thereby avoiding over-fitting and excessive training time for direct training of large-scale neural networks. problem.
实施例3Example 3
本申请实施例3提供一种电子设备,该电子设备包括如实施例2所述的对神经网络模型进行训练的装置。 Embodiment 3 of the present application provides an electronic device including the device for training a neural network model as described in Embodiment 2.
图20是本发明实施例的电子设备2000的系统构成的示意框图。如图20所示,该电子设备2000可以包括中央处理器2100和存储器2140;存储器2140耦合到中央处理器2100。值得注意的是,该图是示例性的;还可以使用其他类型的结构,来补充或代替该结构,以实现电信功能或其他功能。FIG. 20 is a schematic block diagram showing the system configuration of an electronic device 2000 according to an embodiment of the present invention. As shown in FIG. 20, the electronic device 2000 can include a central processor 2100 and a memory 2140; the memory 2140 is coupled to the central processor 2100. It should be noted that the figure is exemplary; other types of structures may be used in addition to or in place of the structure to implement telecommunications functions or other functions.
在一个实施方式中,对神经网络模型进行训练的装置的功能可以被集成到中央处理器2100中。其中,中央处理器2100可以被配置为:In one embodiment, the functionality of the device that trains the neural network model can be integrated into the central processor 2100. The central processing unit 2100 can be configured to:
提取神经网络模型的一部分,以形成神经网络子模型;对所述神经网络子模型进行训练,以形成优化的神经网络子模型;根据所述优化的神经网络子模型中的各权值,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型,并且,所述初始 化神经网络模型与所述优化的神经网络子模型具有相同的输出特性;基于已知训练集,对所述初始化神经网络模型中的各权值进行调整。Extracting a portion of the neural network model to form a neural network sub-model; training the neural network sub-model to form an optimized neural network sub-model; and initializing the weight according to each weight in the optimized neural network sub-model Deriving weights in a neural network model to form an initialized neural network model, and the initial The neural network model has the same output characteristics as the optimized neural network sub-model; and the weights in the initialized neural network model are adjusted based on the known training set.
其中,中央处理器2100还可以被配置为:将所述神经网络模型中的松弛卷积层转化为普通卷积层,以将所述神经网络模型转化为普通卷积神经网络模型;提取所述普通卷积神经网络模型的每一个隐含层中的部分神经元节点,以形成所述神经网络子模型。The central processing unit 2100 may be further configured to: convert the relaxed convolution layer in the neural network model into a common convolution layer to convert the neural network model into a common convolutional neural network model; extract the A part of the neuron nodes in each hidden layer of the general convolutional neural network model to form the neural network sub-model.
其中,中央处理器2100还可以被配置为:根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;将所述初始化普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。The central processing unit 2100 may be further configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized common convolutional neural network. a model; transforming a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialized neural network model.
其中,中央处理器2100还可以被配置为:根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;基于已知训练集,对初始化普通卷积神经网络模型中的各权值进行调整,以形成调整后普通卷积神经网络模型;将所述调整后普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。The central processing unit 2100 may be further configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initialized common convolutional neural network. Model; based on the known training set, adjusting the weights in the initial general convolutional neural network model to form an adjusted ordinary convolutional neural network model; and the ordinary volume in the adjusted ordinary convolutional neural network model The laminate is converted into a relaxed convolutional layer to form the initialized neural network model.
其中,中央处理器2100还可以被配置为:根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络模型中对应的隐含层的权值进行初始化,以形成初始化普通卷积神经网络模型,其中,所述初始化普通卷积神经网络模型的各隐含层的输出特性与所述优化的神经网络子模型的各隐含层的输出特性相同。The central processing unit 2100 may be further configured to: perform weights of corresponding hidden layers in the common convolutional neural network model according to weights of respective hidden layers in the optimized neural network sub-model Initializing to form an initialization common convolutional neural network model, wherein the output characteristics of the implicit layers of the initialized normal convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network submodel.
其中,中央处理器2100还可以被配置为:将所述优化的神经网络子模型中的隐含层的各权值乘以预定的系数,作为所述普通卷积神经网络模型中的对应的隐含层的各权值。The central processing unit 2100 may be further configured to: multiply each weight of the hidden layer in the optimized neural network sub-model by a predetermined coefficient as a corresponding implicit in the common convolutional neural network model Contains the weights of the layers.
在另一个实施方式中,对神经网络模型进行训练的装置可以与中央处理器2100分开配置,例如可以将对神经网络模型进行训练的装置配置为与中央处理器2100连接的芯片,通过中央处理器的控制来实现对神经网络模型进行训练的装置的功能。In another embodiment, the apparatus for training the neural network model may be configured separately from the central processing unit 2100. For example, the apparatus for training the neural network model may be configured as a chip connected to the central processing unit 2100 through the central processing unit. Control to implement the functionality of the device that trains the neural network model.
如图20所示,该电子设备2000还可以包括:通信模块2110、输入单元2120、音频处理单元2130、显示器2160、电源2170。值得注意的是,电子设备2000也并不是必须要包括图20中所示的所有部件;此外,电子设备2000还可以包括图20中没有示出的部件,可以参考现有技术。 As shown in FIG. 20, the electronic device 2000 may further include: a communication module 2110, an input unit 2120, an audio processing unit 2130, a display 2160, and a power source 2170. It should be noted that the electronic device 2000 does not have to include all the components shown in FIG. 20; in addition, the electronic device 2000 may further include components not shown in FIG. 20, and reference may be made to the prior art.
如图20所示,中央处理器2100有时也称为控制器或操作控件,可以包括微处理器或其他处理器装置和/或逻辑装置,该中央处理器2100接收输入并控制电子设备2000的各个部件的操作。As shown in FIG. 20, central processor 2100, also sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device that receives input and controls each of electronic devices 2000. The operation of the part.
其中,存储器2140,例如可以是缓存器、闪存、硬驱、可移动介质、易失性存储器、非易失性存储器或其它合适装置中的一种或更多种,可存储执行有关信息的程序。并且中央处理器2100可执行该存储器2140存储的该程序,以实现信息存储或处理等。其他部件的功能与现有类似,此处不再赘述。电子设备2000的各部件可以通过专用硬件、固件、软件或其结合来实现,而不偏离本发明的范围。The memory 2140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable device, and may store a program for executing related information. . And the central processing unit 2100 can execute the program stored by the memory 2140 to implement information storage or processing and the like. The functions of other components are similar to those of the existing ones and will not be described here. The various components of electronic device 2000 may be implemented by special purpose hardware, firmware, software, or a combination thereof without departing from the scope of the invention.
本申请实施例还提供一种计算机可读程序,其中当在信息处理装置或电子设备中执行所述程序时,所述程序使得所述信息处理装置或电子设备执行实施例1所述的对神经网络模型进行训练的方法。The embodiment of the present application further provides a computer readable program, wherein the program causes the information processing device or the electronic device to perform the pair of nerves described in Embodiment 1 when the program is executed in an information processing device or an electronic device The method of training the network model.
本申请实施例还提供一种存储有计算机可读程序的存储介质,其中,所述存储介质存储上述计算机可读程序,所述计算机可读程序使得信息处理装置或电子设备执行实施例1所述的对神经网络模型进行训练的方法。The embodiment of the present application further provides a storage medium storing a computer readable program, wherein the storage medium stores the computer readable program, wherein the computer readable program causes the information processing device or the electronic device to perform the embodiment 1 A method of training a neural network model.
结合本发明实施例描述的在对神经网络模型进行训练的装置可直接体现为硬件、由处理器执行的软件模块或二者组合。例如,图16-19中所示的功能框图中的一个或多个和/或功能框图的一个或多个组合(例如,…单元、…单元、…单元等),既可以对应于计算机程序流程的各个软件模块,亦可以对应于各个硬件模块。这些软件模块,可以分别对应于实施例1所示的各个步骤。这些硬件模块例如可利用现场可编程门阵列(FPGA)将这些软件模块固化而实现。The apparatus for training a neural network model described in connection with an embodiment of the present invention may be directly embodied as hardware, a software module executed by a processor, or a combination of both. For example, one or more of the functional block diagrams shown in Figures 16-19 and/or one or more combinations of functional block diagrams (e.g., ... units, ... units, ... units, etc.) may correspond to a computer program flow Each software module can also correspond to each hardware module. These software modules may correspond to the respective steps shown in Embodiment 1, respectively. These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).
软件模块可以位于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动磁盘、CD-ROM或者本领域已知的任何其它形式的存储介质。可以将一种存储介质耦接至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息;或者该存储介质可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。该软件模块可以存储在移动终端的存储器中,也可以存储在可插入移动终端的存储卡中。例如,若设备(例如移动终端)采用的是较大容量的MEGA-SIM卡或者大容量的闪存装置,则该软件模块可存储在该MEGA-SIM卡或者大容量的闪存装置中。 The software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. A storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor. The processor and the storage medium can be located in an ASIC. The software module can be stored in the memory of the mobile terminal or in a memory card that can be inserted into the mobile terminal. For example, if a device (such as a mobile terminal) uses a larger capacity MEGA-SIM card or a large-capacity flash memory device, the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.
针对图16-19描述的功能框图中的一个或多个和/或功能框图的一个或多个组合(例如…单元、…单元、…单元等),可以实现为用于执行本申请所描述功能的通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立门或晶体管逻辑器件、分立硬件组件、或者其任意适当组合。针对图16-19描述的功能框图中的一个或多个和/或功能框图的一个或多个组合,还可以实现为计算设备的组合,例如,DSP和微处理器的组合、多个微处理器、与DSP通信结合的一个或多个微处理器或者任何其它这种配置。One or more of the functional block diagrams described with respect to Figures 16-19 and/or one or more combinations of functional block diagrams (e.g., ... units, ... units, ... units, etc.) may be implemented to perform the functions described herein. General purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or any suitable thereof combination. One or more of the functional blocks described with respect to Figures 16-19 and/or one or more combinations of functional blocks may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors One or more microprocessors in conjunction with DSP communication or any other such configuration.
以上结合具体的实施方式对本申请进行了描述,但本领域技术人员应该清楚,这些描述都是示例性的,并不是对本申请保护范围的限制。本领域技术人员可以根据本申请的原理对本申请做出各种变型和修改,这些变型和修改也在本申请的范围内。 The present invention has been described in connection with the specific embodiments thereof, but it is to be understood that the description is intended to be illustrative and not restrictive. Various modifications and alterations of this application will be apparent to those skilled in the art in the light of the invention.

Claims (13)

  1. 一种对神经网络模型进行训练的方法,用于确定神经网络模型中的各权值,该方法包括:A method of training a neural network model for determining weights in a neural network model, the method comprising:
    提取神经网络模型的一部分,以形成神经网络子模型;Extracting a portion of the neural network model to form a neural network submodel;
    对所述神经网络子模型进行训练,以形成优化的神经网络子模型;Training the neural network sub-model to form an optimized neural network sub-model;
    根据所述优化的神经网络子模型中的各权值,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型,并且,所述初始化神经网络模型与所述优化的神经网络子模型具有相同的输出特性;Initializing each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initializing the neural network model and the optimized neural network The model has the same output characteristics;
    基于已知训练集,对所述初始化神经网络模型中的各权值进行调整。Each weight in the initialized neural network model is adjusted based on a known training set.
  2. 如权利要求1所述的对神经网络模型进行训练的方法,其中,提取神经网络模型的一部分包括:The method of training a neural network model according to claim 1, wherein extracting a part of the neural network model comprises:
    将所述神经网络模型中的松弛卷积层转化为普通卷积层,以将所述神经网络模型转化为普通卷积神经网络模型;以及Converting the relaxed convolutional layer in the neural network model to a common convolutional layer to transform the neural network model into a common convolutional neural network model;
    提取所述普通卷积神经网络模型的每一个隐含层中的部分神经元节点,以形成所述神经网络子模型。A portion of the neuron nodes in each of the hidden layers of the common convolutional neural network model are extracted to form the neural network sub-model.
  3. 如权利要求2所述的对神经网络模型进行训练的方法,其中,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型包括:The method of training a neural network model according to claim 2, wherein initializing each weight in the neural network model to form an initialized neural network model comprises:
    根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;以及Initializing each weight in the common convolutional neural network model according to each weight in the optimized neural network sub-model to form an initial common convolutional neural network model;
    将所述初始化普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。The normal convolutional layer in the initialized general convolutional neural network model is transformed into a relaxed convolutional layer to form the initialized neural network model.
  4. 如权利要求2所述的对神经网络模型进行训练的方法,其中,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型包括:The method of training a neural network model according to claim 2, wherein initializing each weight in the neural network model to form an initialized neural network model comprises:
    根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;Initializing, according to each weight in the optimized neural network sub-model, each weight in the common convolutional neural network model to form an initial common convolutional neural network model;
    基于已知训练集,对初始化普通卷积神经网络模型中的各权值进行调整,以形成调整后普通卷积神经网络模型;以及Adjusting the weights in the initial general convolutional neural network model based on the known training set to form an adjusted ordinary convolutional neural network model;
    将所述调整后普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成 所述初始化神经网络模型。Converting a common convolution layer in the adjusted ordinary convolutional neural network model into a relaxed convolution layer to form The initialization neural network model.
  5. 如权利要求3所述的对神经网络模型进行训练的方法,其中,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型包括:The method of training a neural network model according to claim 3, wherein initializing each weight in the common convolutional neural network model to form an initialized general convolutional neural network model comprises:
    根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络模型中对应的隐含层的权值进行初始化,以形成初始化普通卷积神经网络模型,其中,所述初始化普通卷积神经网络模型的各隐含层的输出特性与所述优化的神经网络子模型的各隐含层的输出特性相同。And initializing a weight of a corresponding hidden layer in the common convolutional neural network model according to weights of each hidden layer in the optimized neural network sub-model to form an initial common convolutional neural network model, The output characteristics of the hidden layers of the initialized general convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network sub-model.
  6. 如权利要求5所述的对神经网络模型进行训练的方法,其中,根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络模型中对应的隐含层的权值进行初始化包括:The method for training a neural network model according to claim 5, wherein a corresponding implicit value in said common convolutional neural network model is obtained according to weights of respective hidden layers in said optimized neural network submodel Initialization of the weights of the containing layer includes:
    将所述优化的神经网络子模型中的隐含层的各权值乘以预定的系数,作为所述普通卷积神经网络模型中的对应的隐含层的各权值。The weights of the hidden layers in the optimized neural network sub-model are multiplied by predetermined coefficients as the weights of the corresponding hidden layers in the common convolutional neural network model.
  7. 一种对神经网络模型进行训练的装置,用于确定神经网络模型中的各权值,该装置包括:A device for training a neural network model for determining weights in a neural network model, the device comprising:
    提取单元,其用于提取神经网络模型的一部分,以形成神经网络子模型;An extracting unit for extracting a portion of the neural network model to form a neural network sub-model;
    第一训练单元,其用于对所述神经网络子模型进行训练,以形成优化的神经网络子模型;a first training unit for training the neural network sub-model to form an optimized neural network sub-model;
    初始化单元,其根据所述优化的神经网络子模型中的各权值,初始化所述神经网络模型中的各权值,以形成初始化神经网络模型,并且,所述初始化神经网络模型与所述优化的神经网络子模型具有相同的输出特性;An initialization unit that initializes each weight in the neural network model according to each weight in the optimized neural network sub-model to form an initialization neural network model, and the initialization neural network model and the optimization Neural network submodels have the same output characteristics;
    第二训练单元,其基于已知训练集,对所述初始化神经网络模型中的各权值进行调整。A second training unit that adjusts weights in the initialized neural network model based on a known training set.
  8. 如权利要求7所述的对神经网络模型进行训练的装置,其中,所述提取单元包括:The apparatus for training a neural network model according to claim 7, wherein the extracting unit comprises:
    第一转化单元,其用于将所述神经网络模型中的松弛卷积层转化为普通卷积层,以将所述神经网络模型转化为普通卷积神经网络模型;以及a first conversion unit for converting a relaxed convolutional layer in the neural network model into a common convolutional layer to convert the neural network model into a general convolutional neural network model;
    提取子单元,其用于提取所述普通卷积神经网络模型的每一个隐含层中的部分神经元节点,以形成所述神经网络子模型。An extraction subunit is configured to extract a partial neuron node in each hidden layer of the common convolutional neural network model to form the neural network submodel.
  9. 如权利要求8所述的对神经网络模型进行训练的装置,其中,所述初始化单 元包括:The apparatus for training a neural network model according to claim 8, wherein said initialization sheet The yuan includes:
    第一初始化子单元,其用于根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;以及a first initialization subunit, configured to initialize each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initialization common convolutional neural network model;
    第二转化单元,其用于将所述初始化普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。a second transformation unit for converting a common convolutional layer in the initialized general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
  10. 如权利要求8所述的对神经网络模型进行训练的装置,其中,所述初始化单元包括:The apparatus for training a neural network model according to claim 8, wherein the initialization unit comprises:
    第二初始化子单元,根据所述优化的神经网络子模型中的各权值,初始化所述普通卷积神经网络模型中的各权值,以形成初始化普通卷积神经网络模型;a second initialization subunit, initializing each weight in the common convolutional neural network model according to each weight in the optimized neural network submodel to form an initial common convolutional neural network model;
    第三训练单元,其基于已知训练集,对初始化普通卷积神经网络模型中的各权值进行调整,以形成调整后普通卷积神经网络模型;以及a third training unit that adjusts weights in the initialized general convolutional neural network model based on a known training set to form an adjusted general convolutional neural network model;
    第三转化单元,其用于将所述调整后普通卷积神经网络模型中的普通卷积层转化为松弛卷积层,以形成所述初始化神经网络模型。a third transformation unit for converting a common convolutional layer in the adjusted general convolutional neural network model into a relaxed convolutional layer to form the initialization neural network model.
  11. 如权利要求9所述的对神经网络模型进行训练的装置,其中,所述第一初始化子单元The apparatus for training a neural network model according to claim 9, wherein said first initializing subunit
    根据所述优化的神经网络子模型中的各隐含层的权值,对所述普通卷积神经网络模型中对应的隐含层的权值进行初始化,以形成初始化普通卷积神经网络模型,其中,所述初始化普通卷积神经网络模型的各隐含层的输出特性与所述优化的神经网络子模型的各隐含层的输出特性相同。And initializing a weight of a corresponding hidden layer in the common convolutional neural network model according to weights of each hidden layer in the optimized neural network sub-model to form an initial common convolutional neural network model, The output characteristics of the hidden layers of the initialized general convolutional neural network model are the same as the output characteristics of the hidden layers of the optimized neural network sub-model.
  12. 如权利要求11所述的对神经网络模型进行训练的装置,其中,所述第一初始化子单元The apparatus for training a neural network model according to claim 11, wherein said first initialization subunit
    将所述优化的神经网络子模型中的隐含层的各权值乘以预定的系数,作为所述普通卷积神经网络模型中的对应的隐含层的各权值。The weights of the hidden layers in the optimized neural network sub-model are multiplied by predetermined coefficients as the weights of the corresponding hidden layers in the common convolutional neural network model.
  13. 一种电子设备,包括权利要求7-12的任一项所述的对神经网络模型进行训练的装置。 An electronic device comprising the apparatus for training a neural network model as claimed in any one of claims 7-12.
PCT/CN2016/077975 2016-03-31 2016-03-31 Method and device for training neural network model, and electronic device WO2017166155A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2016/077975 WO2017166155A1 (en) 2016-03-31 2016-03-31 Method and device for training neural network model, and electronic device
JP2018539870A JP6601569B2 (en) 2016-03-31 2016-03-31 Neural network model training method, apparatus, and electronic apparatus
CN201680061886.8A CN108140144B (en) 2016-03-31 2016-03-31 Method and device for training neural network model and electronic equipment
KR1020187017577A KR102161902B1 (en) 2016-03-31 2016-03-31 Training methods, devices and electronics for neural network models

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/077975 WO2017166155A1 (en) 2016-03-31 2016-03-31 Method and device for training neural network model, and electronic device

Publications (1)

Publication Number Publication Date
WO2017166155A1 true WO2017166155A1 (en) 2017-10-05

Family

ID=59962416

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/077975 WO2017166155A1 (en) 2016-03-31 2016-03-31 Method and device for training neural network model, and electronic device

Country Status (4)

Country Link
JP (1) JP6601569B2 (en)
KR (1) KR102161902B1 (en)
CN (1) CN108140144B (en)
WO (1) WO2017166155A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034367A (en) * 2018-08-22 2018-12-18 广州杰赛科技股份有限公司 Neural network update method, device, computer equipment and readable storage medium storing program for executing
CN109919308A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of neural network model dispositions method, prediction technique and relevant device
CN110738648A (en) * 2019-10-12 2020-01-31 山东浪潮人工智能研究院有限公司 camera shell paint spraying detection system and method based on multilayer convolutional neural network

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165738B (en) * 2018-09-19 2021-09-14 北京市商汤科技开发有限公司 Neural network model optimization method and device, electronic device and storage medium
CN110288084A (en) * 2019-06-06 2019-09-27 北京小米智能科技有限公司 Super-network training method and device
US11347308B2 (en) 2019-07-26 2022-05-31 Samsung Electronics Co., Ltd. Method and apparatus with gaze tracking
KR102149495B1 (en) * 2019-08-19 2020-08-28 고려대학교 산학협력단 Optimization apparatus for training conditions of environmental prediction model and operating thereof
CN112561026A (en) * 2019-09-25 2021-03-26 北京地平线机器人技术研发有限公司 Neural network model training method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5452400A (en) * 1991-08-30 1995-09-19 Mitsubishi Denki Kabushiki Kaisha Method of optimizing a combination using a neural network
CN104143327A (en) * 2013-07-10 2014-11-12 腾讯科技(深圳)有限公司 Acoustic model training method and device
WO2015054264A1 (en) * 2013-10-08 2015-04-16 Google Inc. Methods and apparatus for reinforcement learning
CN104700153A (en) * 2014-12-05 2015-06-10 江南大学 PH (potential of hydrogen) value predicting method of BP (back propagation) neutral network based on simulated annealing optimization
CN104794527A (en) * 2014-01-20 2015-07-22 富士通株式会社 Method and equipment for constructing classification model based on convolutional neural network
WO2015118686A1 (en) * 2014-02-10 2015-08-13 三菱電機株式会社 Hierarchical neural network device, learning method for determination device, and determination method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002042107A (en) * 2000-07-31 2002-02-08 Fuji Electric Co Ltd Learning method for neural network
CN100595780C (en) * 2007-12-13 2010-03-24 中国科学院合肥物质科学研究院 Handwriting digital automatic identification method based on module neural network SN9701 rectangular array
CN101975092B (en) * 2010-11-05 2012-08-15 中北大学 Real-time prediction method of mine gas concentration in short and medium terms based on radial basis function neural network integration
CN102479339B (en) * 2010-11-24 2014-07-16 香港理工大学 Method and system for forecasting short-term wind speed of wind farm based on hybrid neural network
JP6042274B2 (en) * 2013-06-28 2016-12-14 株式会社デンソーアイティーラボラトリ Neural network optimization method, neural network optimization apparatus and program
CN104346622A (en) * 2013-07-31 2015-02-11 富士通株式会社 Convolutional neural network classifier, and classifying method and training method thereof
CN104751228B (en) * 2013-12-31 2018-04-27 科大讯飞股份有限公司 Construction method and system for the deep neural network of speech recognition
US10832138B2 (en) * 2014-11-27 2020-11-10 Samsung Electronics Co., Ltd. Method and apparatus for extending neural network
CN104978601B (en) * 2015-06-26 2017-08-25 深圳市腾讯计算机系统有限公司 neural network model training system and method
CN105184312B (en) * 2015-08-24 2018-09-25 中国科学院自动化研究所 A kind of character detecting method and device based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5452400A (en) * 1991-08-30 1995-09-19 Mitsubishi Denki Kabushiki Kaisha Method of optimizing a combination using a neural network
CN104143327A (en) * 2013-07-10 2014-11-12 腾讯科技(深圳)有限公司 Acoustic model training method and device
WO2015054264A1 (en) * 2013-10-08 2015-04-16 Google Inc. Methods and apparatus for reinforcement learning
CN104794527A (en) * 2014-01-20 2015-07-22 富士通株式会社 Method and equipment for constructing classification model based on convolutional neural network
WO2015118686A1 (en) * 2014-02-10 2015-08-13 三菱電機株式会社 Hierarchical neural network device, learning method for determination device, and determination method
CN104700153A (en) * 2014-12-05 2015-06-10 江南大学 PH (potential of hydrogen) value predicting method of BP (back propagation) neutral network based on simulated annealing optimization

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919308A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of neural network model dispositions method, prediction technique and relevant device
CN109919308B (en) * 2017-12-13 2022-11-11 腾讯科技(深圳)有限公司 Neural network model deployment method, prediction method and related equipment
CN109034367A (en) * 2018-08-22 2018-12-18 广州杰赛科技股份有限公司 Neural network update method, device, computer equipment and readable storage medium storing program for executing
CN110738648A (en) * 2019-10-12 2020-01-31 山东浪潮人工智能研究院有限公司 camera shell paint spraying detection system and method based on multilayer convolutional neural network

Also Published As

Publication number Publication date
KR102161902B1 (en) 2020-10-05
CN108140144B (en) 2021-06-01
CN108140144A (en) 2018-06-08
JP2019508803A (en) 2019-03-28
JP6601569B2 (en) 2019-11-06
KR20180084969A (en) 2018-07-25

Similar Documents

Publication Publication Date Title
WO2017166155A1 (en) Method and device for training neural network model, and electronic device
WO2019120110A1 (en) Image reconstruction method and device
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
WO2018018470A1 (en) Method, apparatus and device for eliminating image noise and convolutional neural network
EP4163831A1 (en) Neural network distillation method and device
WO2021164269A1 (en) Attention mechanism-based disparity map acquisition method and apparatus
CN110706214B (en) Three-dimensional U-Net brain tumor segmentation method fusing condition randomness and residual error
WO2020238039A1 (en) Neural network search method and apparatus
WO2022166797A1 (en) Image generation model training method, generation method, apparatus, and device
CN112861659A (en) Image model training method and device, electronic equipment and storage medium
US20210173895A1 (en) Apparatus and method of performing matrix multiplication operation of neural network
CN110506280B (en) Neural network training system, method and computer readable storage medium
CN114463605A (en) Continuous learning image classification method and device based on deep learning
CN112598012B (en) Data processing method in neural network model, storage medium and electronic device
CN107564013B (en) Scene segmentation correction method and system fusing local information
CN111414823B (en) Human body characteristic point detection method and device, electronic equipment and storage medium
CN113674156A (en) Method and system for reconstructing image super-resolution
CN109711454B (en) Feature matching method based on convolutional neural network
WO2022199148A1 (en) Classification model training method, image classification method, electronic device and storage medium
WO2022111231A1 (en) Cnn training method, electronic device, and computer readable storage medium
CN114677535A (en) Training method of domain-adaptive image classification network, image classification method and device
CN110751259A (en) Network layer operation method and device in deep neural network
CN110852348B (en) Feature map processing method, image processing method and device
TWI763975B (en) System and method for reducing computational complexity of artificial neural network
CN108734222B (en) Convolutional neural network image classification method based on correction network

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20187017577

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018539870

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16895936

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16895936

Country of ref document: EP

Kind code of ref document: A1