CN107609645B

CN107609645B - Method and apparatus for training convolutional neural network

Info

Publication number: CN107609645B
Application number: CN201710859122.XA
Authority: CN
Inventors: 刘文献
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-09-21
Filing date: 2017-09-21
Publication date: 2024-04-02
Anticipated expiration: 2037-09-21
Also published as: CN107609645A

Abstract

The embodiment of the application discloses a method and a device for training a convolutional neural network. One embodiment of the method comprises the following steps: for each layer in the initialized convolutional neural network, storing an input information set of the layer by using at least one display card; calculating the mean value and variance of at least part of the input information set of the layer stored in each display card in at least one display card; the method comprises the steps of sending the mean value and the variance of at least part of input information sets of the layer stored in each display card of at least one display card to other display cards to calculate the mean value and the variance of the input information sets of the layer; normalizing the input information set of the layer by using the mean value and the variance of the input information set of the layer to obtain a normalized input information set of the layer; and training the initialized convolutional neural network by utilizing the normalized input information set of each layer to obtain the convolutional neural network after training. This embodiment improves the stability of the convolutional neural network.

Description

Method and apparatus for training convolutional neural network

Technical Field

The present application relates to the field of computer technology, and in particular, to the field of internet technology, and more particularly, to a method and apparatus for training a convolutional neural network.

Background

The convolutional neural network (Convolutional Neural Network, CNN) is a feed-forward neural network whose artificial neurons can respond to surrounding cells in a part of the coverage area with excellent performance for large image processing.

However, as the depth of the convolutional neural network is increased, the distribution of the input information sets for training the layers of the convolutional neural network is different, so that the trained convolutional neural network is unstable. Therefore, how to improve the stability of convolutional neural networks is a current urgent problem to be solved.

Disclosure of Invention

An object of embodiments of the present application is to propose an improved method and apparatus for training convolutional neural networks, which solves the technical problems mentioned in the background section above.

In a first aspect, embodiments of the present application provide a method for training a convolutional neural network, the method comprising: for each layer in the initialized convolutional neural network, storing an input information set of the layer by utilizing at least one display card, wherein each display card in the at least one display card stores at least part of the input information set of the layer; calculating the mean value and variance of at least part of the input information set of the layer stored in each display card in at least one display card; the method comprises the steps of sending the mean value and the variance of at least part of input information sets of the layer stored in each display card of at least one display card to other display cards to calculate the mean value and the variance of the input information sets of the layer; normalizing the input information set of the layer by using the mean value and the variance of the input information set of the layer to obtain a normalized input information set of the layer; and training the initialized convolutional neural network by utilizing the normalized input information set of each layer to obtain the convolutional neural network after training.

In some embodiments, training the initialized convolutional neural network with the normalized input information set of each layer to obtain a trained convolutional neural network, including: the following training steps are performed: inputting the normalized input information set of each layer into each layer of the initialized convolutional neural network to obtain a feature vector set, determining whether the feature vector set meets preset conditions, and if so, taking the initialized convolutional neural network as a convolutional neural network with the training completed; and in response to the preset condition not being met, adjusting parameters of the initialized convolutional neural network, and continuing to execute the training step.

In some embodiments, the set of input information includes a plurality of input information of the same category; and determining whether the feature vector set meets a preset condition, comprising: calculating the distance between each of a plurality of feature vectors corresponding to a plurality of input information of the same category to obtain a first calculation result; based on the first calculation result, whether a preset condition is satisfied is determined.

In some embodiments, calculating a distance between each of a plurality of feature vectors corresponding to a plurality of input information of the same category, to obtain a first calculation result includes: and calculating Euclidean distances among the feature vectors corresponding to the input information of the same category, so as to obtain a first calculation result.

In some embodiments, determining whether the preset condition is met based on the first calculation result includes: determining whether Euclidean distances among all the feature vectors in the plurality of feature vectors corresponding to the input information of the same category are smaller than a first preset distance threshold; if the distance values are smaller than the first preset distance threshold value, the preset conditions are met; if not, the distance values are smaller than the first preset distance threshold value, and the preset conditions are not met.

In some embodiments, the set of input information includes a plurality of different categories of input information; and determining whether the feature vector set meets a preset condition, comprising: calculating the distance between each of a plurality of feature vectors corresponding to the input information of a plurality of different categories to obtain a second calculation result; based on the second calculation result, whether a preset condition is satisfied is determined.

In some embodiments, calculating a distance between each of a plurality of feature vectors corresponding to a plurality of different categories of input information, to obtain a second calculation result includes: and calculating Euclidean distances among the feature vectors in the plurality of feature vectors corresponding to the input information of the different categories to obtain a second calculation result.

In some embodiments, determining whether the preset condition is met based on the second calculation result includes: determining whether Euclidean distances between each of a plurality of feature vectors corresponding to the input information of a plurality of different categories are all larger than a second preset distance threshold; if the first distance threshold value is larger than the second distance threshold value, the preset conditions are met; if the distance values are not all larger than the second preset distance threshold value, the preset conditions are not met.

In some embodiments, the method further comprises: acquiring first input information and second input information; inputting the first input information and the second input information into a convolutional neural network after training is completed, and obtaining a feature vector of the first input information and a feature vector of the second input information; calculating a distance between the feature vector of the first input information and the feature vector of the second input information; based on the calculated distance, it is determined whether the first input information and the second input information are the same category of input information, and the determination result is output.

In a second aspect, embodiments of the present application provide an apparatus for training a convolutional neural network, the apparatus comprising: the normalization unit is configured to store, for each layer of the initialized convolutional neural network, an input information set of the layer by using at least one display card, wherein each display card of the at least one display card stores at least part of the input information set of the layer; calculating the mean value and variance of at least part of the input information set of the layer stored in each display card in at least one display card; the method comprises the steps of sending the mean value and the variance of at least part of input information sets of the layer stored in each display card of at least one display card to other display cards to calculate the mean value and the variance of the input information sets of the layer; normalizing the input information set of the layer by using the mean value and the variance of the input information set of the layer to obtain a normalized input information set of the layer; and the training unit is configured to train the initialized convolutional neural network by utilizing the normalized input information set of each layer to obtain the convolutional neural network after training.

In some embodiments, the training unit comprises: a training subunit configured to perform the following training steps: inputting the normalized input information set of each layer into each layer of the initialized convolutional neural network to obtain a feature vector set, determining whether the feature vector set meets preset conditions, and if so, taking the initialized convolutional neural network as a convolutional neural network with the training completed; and the adjustment subunit is configured to adjust the parameters of the initialized convolutional neural network and continuously execute the training step in response to the condition that the preset condition is not met.

In some embodiments, the set of input information includes a plurality of input information of the same category; the training subunit comprises: the first calculation module is configured to calculate the distance between each of a plurality of feature vectors corresponding to a plurality of input information of the same category, so as to obtain a first calculation result; the first determining module is configured to determine whether a preset condition is met based on a first calculation result.

In some embodiments, the first computing module is further configured to: and calculating Euclidean distances among the feature vectors corresponding to the input information of the same category, so as to obtain a first calculation result.

In some embodiments, the first determination module is further configured to: determining whether Euclidean distances among all the feature vectors in the plurality of feature vectors corresponding to the input information of the same category are smaller than a first preset distance threshold; if the distance values are smaller than the first preset distance threshold value, the preset conditions are met; if not, the distance values are smaller than the first preset distance threshold value, and the preset conditions are not met.

In some embodiments, the set of input information includes a plurality of different categories of input information; the training subunit comprises: the second calculation module is configured to calculate the distance between each of a plurality of feature vectors corresponding to a plurality of different types of input information to obtain a second calculation result; and the second determining module is configured to determine whether a preset condition is met or not based on a second calculation result.

In some embodiments, the second computing module is further configured to: and calculating Euclidean distances among the feature vectors in the plurality of feature vectors corresponding to the input information of the different categories to obtain a second calculation result.

In some embodiments, the second determination module is further configured to: determining whether Euclidean distances between each of a plurality of feature vectors corresponding to the input information of a plurality of different categories are all larger than a second preset distance threshold; if the first distance threshold value is larger than the second distance threshold value, the preset conditions are met; if the distance values are not all larger than the second preset distance threshold value, the preset conditions are not met.

In some embodiments, the apparatus further comprises: an acquisition unit configured to acquire first input information and second input information; the input unit is configured to input the first input information and the second input information into the convolutional neural network after training is completed, and a feature vector of the first input information and a feature vector of the second input information are obtained; a calculation unit configured to calculate a distance between a feature vector of the first input information and a feature vector of the second input information; and a determining unit configured to determine whether the first input information and the second input information are input information of the same category based on the calculated distance, and output a determination result.

In a third aspect, embodiments of the present application provide a server, including: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

According to the method and the device for training the convolutional neural network, for each layer in the layers of the convolutional neural network, the average value and the variance of at least part of input information sets of the layer stored by each display card in at least one display card are calculated, and the average value and the variance of at least part of input information sets of the layer stored by each display card in at least one display card are sent to other display cards, so that the average value and the variance of the input information sets of the layer are calculated; then, normalizing the input information set of the layer by utilizing the mean value and the variance of the input information set of the layer so as to obtain the normalized input information set of the layer; and finally training the initialized convolutional neural network by utilizing the normalized input information set of each layer, thereby obtaining the convolutional neural network after training. The input information sets of all layers are normalized by means of the mean value and the variance of the input information sets of all layers, so that the normalized input information sets of all layers are identical in distribution, stability of the normalized input information sets of all layers is improved, and stability of the trained convolutional neural network is further improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for training a convolutional neural network in accordance with the present application;

FIG. 3 is an exploded flow chart of the steps of training an initializing convolutional neural network with a normalized set of input information for each layer in the flow chart of FIG. 2;

FIG. 4 is a schematic structural view of one embodiment of an apparatus for training a convolutional neural network in accordance with the present application;

FIG. 5 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present application.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates an exemplary system architecture 100 to which the methods for training a convolutional neural network or apparatus for training a convolutional neural network of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The terminal devices 101, 102, 103 interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be a variety of electronic devices including, but not limited to, smartphones, tablets, laptop portable computers, desktop computers, and the like.

The server 105 may provide various services, for example, the server 105 may obtain a set of input information from the terminal devices 101, 102, 103 through the network 104 to enable training of the initialized convolutional neural network and obtain a trained convolutional neural network.

It should be noted that, the method for training the convolutional neural network provided in the embodiments of the present application is generally performed by the server 105, and accordingly, the apparatus for training the convolutional neural network is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the input information set is stored in the server 105, the system architecture 100 may not set the terminal devices 101, 102, 103.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for training a convolutional neural network in accordance with the present application is shown. The method for training the convolutional neural network comprises the following steps:

step 201, for each layer of the initialized convolutional neural network, storing an input information set of the layer by using at least one display card.

In this embodiment, the electronic device (e.g., server 105 shown in fig. 1) on which the method for training the convolutional neural network operates may store the input information set for each of the layers initializing the convolutional neural network on at least one display card. Here, as the depth of the convolutional neural network increases, the amount of data used to train the input information set of the convolutional neural network increases, and thus, at least one graphic card is typically provided in the electronic device to store the input information set of each of the layers initializing the convolutional neural network. Wherein each of the at least one graphics card may store at least a portion of the input information set for the layer. In particular, for each of the layers of the initialized convolutional neural network, the input information set of that layer may be divided into several parts, with a graphics card for storing a portion of the input information set of that layer. As an example, a total of m×n input information is included in the input information set of the initializing convolutional neural network, the input information set is divided into n input information subsets containing m input information, and each of the n display cards stores one input information subset, where m and n are positive integers.

In this embodiment, the electronic device on which the method for training the convolutional neural network operates may be a server, and at least one graphics card may be set in the server; the electronic device on which the method for training a convolutional neural network operates may also be a server cluster, and at least one graphics card may be disposed within each server in the server cluster. As an example, the electronic device is a server cluster, and 4 graphics cards are provided within each server in the server cluster.

In this embodiment, the convolutional neural network may be a feedforward neural network whose artificial neurons can respond to surrounding cells in a part of the coverage area with excellent performance for large image processing. Typically, the basic structure of convolutional neural networks includes two layers, one of which is a feature extraction layer, and the input information of each neuron is connected to a local receptive field of the previous layer and extracts the local features. Once the local feature is extracted, the positional relationship between the other features is also determined; and the second is a feature mapping layer, each calculation layer of the network consists of a plurality of feature maps, each feature map is a plane, and the weights of all neurons on the plane are equal. As an example, the convolutional neural network may be AlexNet. Among them, alexNet is an existing structure of convolutional neural network, and in the competition of ImageNet (a computer vision system recognizes item names, which is the database of the greatest image recognition in the world at present), geofrey (geffrey) and other students Alex (Alex) use a structure called AlexNet. Typically, alexNet comprises 8 layers, with the first 5 layers being the convolutions and the next 3 layers being full-connected. As another example, the convolutional neural network may be GoogleNet. Among them, google net is also an existing structure of convolutional neural network, and is a champion model in competition of ImageNet in 2014. The basic component is similar to AlexNet and is a 22-layer model.

Step 202, calculating the mean and variance of at least part of the input information set of the layer stored in each display card of at least one display card.

In this embodiment, the electronic device may calculate the mean and the variance of the input information set of at least a portion of the layer stored in each of the at least one graphics card respectively.

And 203, transmitting the mean and variance of at least part of the input information sets of the layer stored in each display card of at least one display card to other display cards to calculate the mean and variance of the input information sets of the layer.

In this embodiment, the electronic device may send the mean and the variance of at least a portion of the input information set of the layer stored in each of the at least one graphics card to other graphics cards, so that the mean and the variance of at least a portion of the input information set of the layer stored in other graphics cards are stored in each graphics card; and then calculating the mean value and the variance of the input information set of the layer by using the mean value and the variance of at least part of the input information set of the layer stored in each display card of at least one display card. As an example, the electronic device may calculate a mean value of at least a portion of the input information set of the layer stored by each of the at least one graphics card, and use the mean value as the mean value of the input information set of the layer; and meanwhile, the variance of the mean value of at least part of the input information sets of the layer stored in each display card in at least one display card can be calculated and used as the variance of the input information sets of the layer.

And 204, normalizing the input information set of the layer by using the mean and the variance of the input information set of the layer to obtain the normalized input information set of the layer.

In this embodiment, the electronic device may normalize the input information set of the layer by using the mean and the variance in the input information set of the layer, so as to obtain the normalized input information set of the layer.

In this embodiment, in the process of training the convolutional neural network, the initialization parameters of each layer of the convolutional neural network are continuously adjusted, so that the distribution of the input information set of each subsequent layer is also changed, and the training process needs to keep the same distribution of the input information set of each layer, so that the electronic device needs to normalize the input information set of each layer. Here, after initializing each layer of the convolutional neural network, a BN (Batch Normalization, normalization) layer may be inserted for normalizing the input information set of each layer to: mean 0 and variance 1.

And step 205, training the initialized convolutional neural network by using the normalized input information set of each layer to obtain the trained convolutional neural network.

In this embodiment, the electronic device may train the initialized convolutional neural network by using the normalized input information set of each layer, thereby obtaining a convolutional neural network after training.

In this embodiment, the convolutional neural network after training may be used to characterize the correspondence between the input information and the feature vector of the input information. Here, the electronic device may train the convolutional neural network in a variety of ways.

As an example, the electronic device may input the input information set from the input side of the initialized convolutional neural network, sequentially pass through the layers in the initialized convolutional neural network and the processes of the BN layer after the layers, and output from the output side of the initialized convolutional neural network. Here, the electronic device may perform normalization processing on the input information set of the next layer by using the BN layer after each layer, and perform processing (e.g., product, convolution) on the normalized input information set of each layer by using the parameter matrix of each layer. The initialization convolutional neural network stores initialization parameters, and the initialization parameters can be continuously adjusted in the training process of the convolutional neural network until the convolutional neural network with the output characteristic vector set meeting the preset constraint condition is trained.

As another example, the electronic device may generate a correspondence table storing correspondence between a plurality of input information and feature vectors of the input information based on the input information and feature vector statistics of the input information normalized in large numbers, and take the correspondence table as the convolutional neural network after training is completed.

In this embodiment, the convolutional neural network after training can be applied to various scenes. Alternatively, the electronic device may first acquire the first input information and the second input information; inputting the first input information and the second input information into a convolutional neural network after training is completed, and obtaining a feature vector of the first input information and a feature vector of the second input information; then calculating the distance between the feature vector of the first input information and the feature vector of the second input information; and finally, based on the calculated distance, determining whether the first input information and the second input information are input information of the same category, and outputting a determination result.

As one example, the electronic device may calculate the euclidean distance between the feature vector of the first input information and the feature vector of the second input information. Where euclidean distance may also be referred to as euclidean metric (euclidean metric), generally refers to the true distance between two points in m-dimensional space, or the natural length of a vector (i.e., the distance of the point from the origin). The euclidean distance in two and three dimensions is the actual distance between two points. In general, the smaller the euclidean distance between two vectors, the greater the likelihood that the input information corresponding to the two vectors belongs to the same category; the greater the euclidean distance between two vectors, the less likely the input information corresponding to the two vectors belongs to the same category.

As another example, the electronic device may calculate a cosine distance between the feature vector of the first input information and the feature vector of the second input information. The cosine distance may be called cosine similarity, and the similarity is estimated by calculating the cosine value of the included angle between the two vectors. In general, the smaller the included angle between two vectors, the closer the cosine value is to 1, and the higher the similarity is, the greater the possibility that the input information corresponding to the two vectors belongs to the same category; the larger the included angle between the two vectors is, the more the cosine value deviates from 1, and the lower the similarity is, the less the input information corresponding to the two vectors belongs to the same category.

According to the method for training the convolutional neural network, for each layer in each layer of the convolutional neural network, the mean value and the variance of at least part of input information sets of the layer stored by each display card in at least one display card are calculated, and the mean value and the variance of at least part of input information sets of the layer stored by each display card in at least one display card are sent to other display cards, so that the mean value and the variance of the input information sets of the layer are calculated; then, normalizing the input information set of the layer by utilizing the mean value and the variance of the input information set of the layer so as to obtain the normalized input information set of the layer; and finally training the initialized convolutional neural network by utilizing the normalized input information set of each layer, thereby obtaining the convolutional neural network after training. And carrying out normalization processing on the input information sets of each layer by utilizing the mean value and the variance of the input information sets of each layer, so that the stability of the normalized input information sets of each layer is improved, and the stability of the trained convolutional neural network is further improved.

In an alternative way of training the convolutional neural network, the step of training the initialized convolutional neural network with the normalized set of input information for each layer in the flowchart of fig. 2 may be broken down into multiple sub-steps. Referring specifically to fig. 3, there is shown a decomposition flow 300 of the step of training an initializing convolutional neural network with normalized sets of input information for each layer in the flow chart of fig. 2. In fig. 3, the step of training the initializing convolutional neural network with the normalized input information set of each layer is decomposed into 4 substeps as follows: step 301, step 302, step 303 and step 304.

Step 301, inputting the normalized input information set of each layer to each layer of the initialized convolutional neural network to obtain a feature vector set.

In this embodiment, the electronic device may input the normalized input information set of each layer to the initialized convolutional neural network, thereby obtaining the feature vector set. Specifically, the electronic device may input the input information set from the input side of the initialized convolutional neural network, sequentially pass through each layer in the initialized convolutional neural network and the processing of the BN layer after each layer, and output from the output side of the initialized convolutional neural network. Here, the electronic device may perform normalization processing on the input information set of the next layer by using the BN layer after each layer, and perform processing (e.g., product, convolution) on the normalized input information set of each layer by using the parameter matrix of each layer.

Step 302, determining whether the feature vector set satisfies a preset condition.

In this embodiment, based on the feature vector set obtained in step 301, the electronic device may determine whether the feature vector set meets a preset condition; in case that the preset condition is satisfied, step 303 is performed; in case the preset condition is not met, step 304 is performed. Specifically, the electronic device may first obtain certain rules possessed by the feature vector set; then determining whether the acquired rule accords with a preset rule; if the preset rule is met, the preset condition is met; if the preset rule is not met, the preset condition is not met.

In some optional implementations of this embodiment, the electronic device may determine whether the set of feature vectors meets the preset condition by at least one of:

1. first, distances among the feature vectors corresponding to the input information of the same category are calculated, and a first calculation result is obtained. Wherein the set of input information may comprise a plurality of input information of the same category.

As an example, the electronic device may calculate euclidean distances between respective feature vectors of a plurality of feature vectors corresponding to a plurality of input information of the same category to obtain the first calculation result.

As another example, the electronic device may calculate cosine distances between respective feature vectors of a plurality of feature vectors corresponding to the plurality of same categories of input information to obtain the first calculation result.

Then, based on the first calculation result, it is determined whether a preset condition is satisfied.

As an example, the electronic device may determine whether euclidean distances between respective feature vectors of a plurality of feature vectors corresponding to the plurality of same categories of input information are all less than a first preset distance threshold; if the distance values are smaller than the first preset distance threshold value, the preset conditions are met; if not, the distance values are smaller than the first preset distance threshold value, and the preset conditions are not met.

As another example, the electronic device may compare a cosine distance between each of a plurality of feature vectors corresponding to a plurality of same categories of input information with 1; if the value is close to 1, the preset condition is met; if the deviation is 1, the preset condition is not satisfied.

2. First, distances among the feature vectors corresponding to the input information of the different categories are calculated, and a second calculation result is obtained. Wherein the set of input information may comprise a plurality of different categories of input information.

As an example, the electronic device may calculate euclidean distances between respective feature vectors of a plurality of feature vectors corresponding to a plurality of different categories of input information to obtain the second calculation result.

As another example, the electronic device may calculate cosine distances between respective feature vectors of a plurality of feature vectors corresponding to a plurality of different categories of input information to obtain the second calculation result.

Then, based on the second calculation result, it is determined whether a preset condition is satisfied.

As an example, the electronic device may determine whether euclidean distances between respective feature vectors of a plurality of feature vectors corresponding to a plurality of different categories of input information are each greater than a second preset distance threshold; if the first distance threshold value is larger than the second distance threshold value, the preset conditions are met; if the distance values are not all larger than the second preset distance threshold value, the preset conditions are not met.

As another example, the electronic device may compare a cosine distance between each of a plurality of feature vectors corresponding to a plurality of different categories of input information with 1; if the deviation is 1, a preset condition is met; if the value is close to 1, the preset condition is not satisfied.

Step 303, using the initialized convolutional neural network as the convolutional neural network after training.

In this embodiment, when the preset condition is met, it is indicated that the convolutional neural network training is completed, and at this time, the electronic device may use the initialized convolutional neural network as the convolutional neural network after the training is completed. The convolutional neural network after training can make the distance between the feature vectors of the input information of the same category as close as possible, and make the similarity between the feature vectors of the input information of different categories as far as possible.

In step 303, parameters of the initialized convolutional neural network are adjusted.

In this embodiment, under the condition that the preset condition is not met, the electronic device may adjust the parameters of the initialized convolutional neural network, and return to execute step 301 until the convolutional neural network meeting the preset condition is trained.

With further reference to fig. 4, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of an apparatus for training a convolutional neural network, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 4, the apparatus 400 for training a convolutional neural network of the present embodiment may include: a normalization unit 401 and a training unit 402. Wherein, the normalizing unit 401 is configured to store, for each layer in the layers of the initialized convolutional neural network, an input information set of the layer by using at least one display card, where each display card in the at least one display card stores at least part of the input information set of the layer; calculating the mean value and variance of at least part of the input information set of the layer stored in each display card in at least one display card; the method comprises the steps of sending the mean value and the variance of at least part of input information sets of the layer stored in each display card of at least one display card to other display cards to calculate the mean value and the variance of the input information sets of the layer; normalizing the input information set of the layer by using the mean value and the variance of the input information set of the layer to obtain a normalized input information set of the layer; the training unit 402 is configured to train the initialized convolutional neural network by using the normalized input information set of each layer, so as to obtain a trained convolutional neural network.

In this embodiment, in the apparatus 400 for training a convolutional neural network: the specific processing of the normalization unit 401 and the training unit 402 and the technical effects thereof may refer to the relevant descriptions of steps 201 to 204 and step 205 in the corresponding embodiment of fig. 2, and are not described herein again.

In some alternative implementations of the present embodiment, the training unit 402 may include: a training subunit (not shown in the figures) configured to perform the following training steps: inputting the normalized input information set of each layer into each layer of the initialized convolutional neural network to obtain a feature vector set, determining whether the feature vector set meets preset conditions, and if so, taking the initialized convolutional neural network as a convolutional neural network with the training completed; an adjustment subunit (not shown in the figure) configured to adjust the parameters of the initialized convolutional neural network and to continue the training step in response to the preset condition not being met.

In some alternative implementations of the present embodiment, the set of input information may include a plurality of the same categories of input information; the training subunit may include: a first calculation module (not shown in the figure) configured to calculate a distance between each of a plurality of feature vectors corresponding to a plurality of input information of the same category, to obtain a first calculation result; a first determining module (not shown in the figure) configured to determine whether a preset condition is satisfied based on the first calculation result.

In some optional implementations of the present embodiment, the first computing module may be further configured to: and calculating Euclidean distances among the feature vectors corresponding to the input information of the same category, so as to obtain a first calculation result.

In some optional implementations of this embodiment, the first determining module may be further configured to: determining whether Euclidean distances among all the feature vectors in the plurality of feature vectors corresponding to the input information of the same category are smaller than a first preset distance threshold; if the distance values are smaller than the first preset distance threshold value, the preset conditions are met; if not, the distance values are smaller than the first preset distance threshold value, and the preset conditions are not met.

In some alternative implementations of the present embodiment, the set of input information may include a plurality of different categories of input information; the training subunit may include: a second calculation module (not shown in the figure) configured to calculate a distance between each of a plurality of feature vectors corresponding to a plurality of different types of input information, to obtain a second calculation result; a second determining module (not shown in the figure) configured to determine whether the preset condition is satisfied based on the second calculation result.

In some optional implementations of the present embodiment, the second computing module may be further configured to: and calculating Euclidean distances among the feature vectors in the plurality of feature vectors corresponding to the input information of the different categories to obtain a second calculation result.

In some optional implementations of this embodiment, the second determining module may be further configured to: determining whether Euclidean distances between each of a plurality of feature vectors corresponding to the input information of a plurality of different categories are all larger than a second preset distance threshold; if the first distance threshold value is larger than the second distance threshold value, the preset conditions are met; if the distance values are not all larger than the second preset distance threshold value, the preset conditions are not met.

In some optional implementations of the present embodiment, the apparatus 400 for training a convolutional neural network may further include: an acquisition unit (not shown in the figure) configured to acquire the first input information and the second input information; an input unit (not shown in the figure) configured to input the first input information and the second input information to the convolutional neural network after training is completed, to obtain a feature vector of the first input information and a feature vector of the second input information; a calculating unit (not shown in the figure) configured to calculate a distance between the feature vector of the first input information and the feature vector of the second input information; a determining unit (not shown in the figure) configured to determine whether the first input information and the second input information are the same category of input information based on the calculated distance, and output a determination result.

Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing a server of an embodiment of the present application. The server illustrated in fig. 5 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments herein.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium described in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units may also be provided in a processor, for example, described as: a processor includes a normalization unit and a training unit. The names of these units do not limit the units themselves in some cases, for example, the training unit may also be described as "a unit that trains an initialized convolutional neural network with a normalized input information set of each layer, resulting in a trained convolutional neural network".

As another aspect, the present application also provides a computer-readable medium that may be contained in the server described in the above embodiment; or may exist alone without being assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: for each layer in the initialized convolutional neural network, storing an input information set of the layer by utilizing at least one display card, wherein each display card in the at least one display card stores at least part of the input information set of the layer; calculating the mean value and variance of at least part of the input information set of the layer stored in each display card in at least one display card; the method comprises the steps of sending the mean value and the variance of at least part of input information sets of the layer stored in each display card of at least one display card to other display cards to calculate the mean value and the variance of the input information sets of the layer; normalizing the input information set of the layer by using the mean value and the variance of the input information set of the layer to obtain a normalized input information set of the layer; and training the initialized convolutional neural network by utilizing the normalized input information set of each layer to obtain the convolutional neural network after training.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. A method for training a convolutional neural network, the method comprising:

for each layer in each layer of the initialized convolutional neural network, storing an input information set of the layer by utilizing a plurality of display cards, wherein each display card in the plurality of display cards stores at least part of the input information set of the layer; calculating the mean value and variance of at least part of the input information set of the layer stored in each display card in the plurality of display cards; the method comprises the steps that the average value and the variance of at least part of input information sets of the layer stored in each display card in the plurality of display cards are sent to other display cards, so that the average value and the variance of at least part of input information sets of the layer stored in the other display cards are stored in each display card, and the average value and the variance of the input information sets of the layer are calculated; normalizing the input information set of the layer by using the mean value and the variance of the input information set of the layer to obtain a normalized input information set of the layer;

Training the initialized convolutional neural network by utilizing the normalized input information set of each layer to obtain a trained convolutional neural network, wherein the training comprises the following steps: inputting the normalized input information set of each layer into each layer of the initialized convolutional neural network to obtain a feature vector set, and obtaining a convolutional neural network after training based on the feature vector set;

acquiring first input information and second input information;

inputting the first input information and the second input information into the convolutional neural network after training is completed, and obtaining feature vectors of the first input information and feature vectors of the second input information;

calculating a distance between the feature vector of the first input information and the feature vector of the second input information;

based on the calculated distance, it is determined whether the first input information and the second input information are input information of the same category, and a determination result is output.

2. The method of claim 1, wherein inputting the normalized input information set for each layer into each layer of the initialized convolutional neural network to obtain a set of feature vectors, and obtaining a trained convolutional neural network based on the set of feature vectors, comprises:

The following training steps are performed: determining whether the feature vector set meets a preset condition, and if so, taking the initialized convolutional neural network as a convolutional neural network with training completed;

and in response to the preset condition not being met, adjusting parameters of the initialized convolutional neural network, and continuing to execute the training step.

3. The method of claim 2, wherein the set of input information comprises a plurality of input information of the same category; and

the determining whether the feature vector set meets a preset condition includes:

calculating the distance between each of a plurality of feature vectors corresponding to the plurality of input information of the same category to obtain a first calculation result;

and determining whether a preset condition is met or not based on the first calculation result.

4. The method of claim 3, wherein calculating the distance between each of the plurality of feature vectors corresponding to the plurality of input information of the same category, to obtain the first calculation result, includes:

and calculating Euclidean distances among the feature vectors in the plurality of feature vectors corresponding to the input information of the same category to obtain a first calculation result.

5. The method of claim 4, wherein determining whether a preset condition is satisfied based on the first calculation result comprises:

determining whether Euclidean distances among all the feature vectors in the plurality of feature vectors corresponding to the input information of the same category are smaller than a first preset distance threshold value;

if the first preset distance threshold value is smaller than the first preset distance threshold value, the preset condition is met;

if not, the preset conditions are not satisfied.

6. Method according to one of the claims 2-5, characterized in that the set of input information comprises a plurality of different categories of input information; and

calculating the distance between each of a plurality of feature vectors corresponding to the input information of the different categories to obtain a second calculation result;

and determining whether a preset condition is met or not based on the second calculation result.

7. The method of claim 6, wherein calculating the distance between each of the plurality of feature vectors corresponding to the plurality of different types of input information to obtain the second calculation result includes:

And calculating Euclidean distances among the feature vectors in the plurality of feature vectors corresponding to the input information of the different categories to obtain a second calculation result.

8. The method of claim 7, wherein determining whether a preset condition is satisfied based on the second calculation result comprises:

determining whether Euclidean distances among all the feature vectors in the plurality of feature vectors corresponding to the input information of the different categories are larger than a second preset distance threshold;

if both the first and second preset distance thresholds are larger than the second preset distance threshold, the preset condition is met;

and if not, the preset conditions are not met.

9. An apparatus for training a convolutional neural network, the apparatus comprising:

the normalization unit is configured to store an input information set of each layer of the initialized convolutional neural network by utilizing a plurality of display cards, wherein each display card in the plurality of display cards stores at least part of the input information set of the layer; calculating the mean value and variance of at least part of the input information set of the layer stored in each display card in the plurality of display cards; the method comprises the steps that the average value and the variance of at least part of input information sets of the layer stored in each display card in the plurality of display cards are sent to other display cards, so that the average value and the variance of at least part of input information sets of the layer stored in the other display cards are stored in each display card, and the average value and the variance of the input information sets of the layer are calculated; normalizing the input information set of the layer by using the mean value and the variance of the input information set of the layer to obtain a normalized input information set of the layer;

The training unit is configured to train the initialized convolutional neural network by using the normalized input information set of each layer to obtain a trained convolutional neural network, and comprises the following steps: inputting the normalized input information set of each layer into each layer of the initialized convolutional neural network to obtain a feature vector set, and obtaining a convolutional neural network after training based on the feature vector set;

an acquisition unit configured to acquire first input information and second input information;

an input unit configured to input the first input information and the second input information to the trained convolutional neural network, and obtain a feature vector of the first input information and a feature vector of the second input information;

a calculation unit configured to calculate a distance between a feature vector of the first input information and a feature vector of the second input information;

and a determining unit configured to determine whether the first input information and the second input information are input information of the same category based on the calculated distance, and output a determination result.

10. The apparatus of claim 9, wherein the training unit comprises:

A training subunit configured to perform the following training steps: determining whether the feature vector set meets a preset condition, and if so, taking the initialized convolutional neural network as a convolutional neural network with training completed;

and the adjustment subunit is configured to adjust the parameters of the initialized convolutional neural network and continue to execute the training step in response to the preset condition not being met.

11. The apparatus of claim 10, wherein the set of input information comprises a plurality of input information of the same category; and

the training subunit comprises:

the first calculation module is configured to calculate the distance between each of a plurality of feature vectors corresponding to the plurality of input information of the same category, so as to obtain a first calculation result;

and the first determining module is configured to determine whether a preset condition is met or not based on the first calculation result.

12. The apparatus according to claim 10 or 11, wherein the set of input information comprises a plurality of different categories of input information; and

the training subunit comprises:

the second calculation module is configured to calculate the distance between each of a plurality of feature vectors corresponding to the plurality of different types of input information to obtain a second calculation result;

And the second determining module is configured to determine whether a preset condition is met or not based on the second calculation result.

13. A server, the server comprising:

one or more processors;

a storage means for storing one or more programs;

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-8.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method according to any of claims 1-8.