WO2016125476A1

WO2016125476A1 - Determination method and program

Info

Publication number: WO2016125476A1
Application number: PCT/JP2016/000462
Authority: WO
Inventors: ミンヤンキム; ルカリガッツィオ; 宗太郎築澤; 和紀小塚
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2015-02-06
Filing date: 2016-01-29
Publication date: 2016-08-11

Abstract

Provided is a determination method for determining a structure of a convolutional neural network, comprising: an acquisition step (S10) of acquiring, as initial values, N filters (where N is a natural number greater than or equal to 1) for which weights are learned using an image group for learning; and a segmenting step (S20) of increasing the N filters to M filters (where M is a natural number greater than or equal to 2), with M being greater than N, by carrying out conversions which are used in an image processing sector upon at least one of the N filters and appending the filters whereupon the conversion is carried out.

Description

Determination method and program

The present disclosure relates to a determination method and a program, and more particularly to a determination method and a program for determining a structure of a convolutional neural network.

In recent years, the performance of image recognition has improved dramatically by using Deep Learning. Deep Learning is known as a machine learning methodology using a multilayer neural network. For such a multilayer neural network, for example, a convolutional neural network is used (see Non-Patent Document 1, for example). Here, the convolutional neural network is composed of a multilayer neural network that repeats convolution and pooling of a local region. Non-Patent Document 1 proposes a structure of a neural network in which a plurality of convolution layers are sandwiched between pooling layers as a structure of a convolution neural network, and this structure improves image recognition performance. It is disclosed that it can be improved.

In order to solve the above-described problem, a determination method according to an aspect of the present disclosure is a determination method for determining a structure of a convolutional neural network, in which N pieces of weights learned using a learning image group ( N is a natural number of 1 or more) as an initial value, and at least one of the N filters is added with a filter subjected to conversion used in the image processing field. And a division step of increasing the number of filters to M filters larger than N (M is a natural number of 2 or more).

These general or specific modes may be realized by a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM. The system, method, integrated circuit, computer You may implement | achieve with arbitrary combinations of a program and a recording medium.

According to the present disclosure, it is possible to realize a determination method that can more easily determine the structure of the convolutional neural network.

FIG. 1 is a block diagram illustrating an example of the configuration of the determination apparatus according to the embodiment. FIG. 2A is a diagram for explaining an overview of the identification processing of the convolutional neural network. FIG. 2B is a diagram for explaining the outline of the identification processing of the convolutional neural network. FIG. 3 is a diagram for explaining the outline of the division processing performed by the determination apparatus shown in FIG. FIG. 4 is a diagram illustrating an example of the dividing process performed by the dividing unit illustrated in FIG. FIG. 5 is a diagram for explaining an overview of the integration process performed by the determination apparatus shown in FIG. FIG. 6 is a diagram illustrating an example of integration processing performed by the integration unit illustrated in FIG. 1. FIG. 7 is a flowchart illustrating an example of the determination process in the embodiment. FIG. 8 is a flowchart showing an example of detailed processing in step S20 shown in FIG. FIG. 9 is a flowchart showing an example of detailed processing in step S30 shown in FIG. FIG. 10 is a diagram illustrating an example of details of the algorithm of the determination process of the determination device according to the first embodiment. FIG. 11 is a diagram illustrating an example of a determination process performed by the determination apparatus according to the first embodiment. FIG. 12A is a diagram for describing the effectiveness of the integration processing according to the first embodiment. FIG. 12B is a diagram for explaining the effectiveness of the integration processing according to the first embodiment. FIG. 12C is a diagram for explaining the effectiveness of the integration processing according to the first embodiment. FIG. 12D is a diagram for describing the effectiveness of the integration processing according to the first embodiment. FIG. 13 is a diagram illustrating identification performance values in each of a plurality of data sets in the second embodiment. FIG. 14 is a diagram illustrating an example of a model structure using the MNIST data set. FIG. 15 is a diagram illustrating an error rate when the division process or the integration process of the present disclosure is performed on the MNIST model structure. FIG. 16 is a diagram illustrating an error rate when the division processing or the integration processing according to the present disclosure is performed on the GTSRB1 model structure. FIG. 17 is a diagram illustrating an output value of an error function when the GTSRB1 model structure and the GTSRB1 model structure subjected to the division process or the integration process of the present disclosure are optimized. FIG. 18 is a diagram illustrating an example of a model structure using the GTSRB data set. FIG. 19 is a diagram illustrating an error rate when the division process or the integration process of the present disclosure is performed on the GTSRB-3DNN model structure. FIG. 20 is a diagram illustrating an error rate when the division processing or the integration processing according to the present disclosure is performed on the CIFAR-10 model structure. FIG. 21 is a diagram illustrating a comparison of identification calculation times when the integration processing of the present disclosure is performed.

(Knowledge that became the basis of the invention)
In recent years, a multilayer neural network such as DNN (Deep Neural Network) has been remarkably improved in its performance by being used not only for image recognition but also for machine learning such as speech recognition and machine translation. DNN has been successful in improving their performance with theoretically proven modeling and generalization capabilities. The results can be obtained substantially by improving training algorithms that perform parameter detection at high speed, improving ever-increasing data sets, and enhancing new computer platforms.

However, parameter determination (so-called training) is usually started after the structure of the multilayer neural network is manually determined by experts in the field. The structure of the multi-layer neural network occupies a considerable part in the above performance improvement, and is determined depending on repeated experiments by highly experienced experts.

For example, Non-Patent Document 1 proposes a structure of a neural network in which a plurality of convolution layers are sandwiched by sandwiching a plurality of convolution layers between pooling layers as described above. However, the structure is complicated and can be determined (designed) only by an expert.

That is, there is a problem that a person other than an expert cannot determine (design) the structure of a convolutional neural network effective for image recognition.

Therefore, the inventors (e.g.) have come up with a determination method and a program for determining the structure of a convolutional neural network simply (or automatically) while performing parameter determination.

That is, the determination method according to an aspect of the present disclosure is a determination method for determining the structure of a convolutional neural network, and N weights (N is a natural number equal to or greater than 1) in which weights are learned using a learning image group. ) Filter as an initial value, and at least one of the N filters, a filter subjected to conversion used in the field of image processing is added to obtain the N filters. A dividing step of increasing the number of filters to larger M (M is a natural number of 2 or more).

This makes it possible to more easily determine the structure of the convolutional neural network, so that non-experts can use the structure of the convolutional neural network that is effective for image recognition.

Further, for example, the division step includes a division evaluation step of evaluating the identification performance of the M filters by causing the M filters to learn weights using the learning image group, When the discrimination performance evaluated in the evaluation step is equal to or lower than the discrimination performance of the N filters, the division step may be performed again.

Further, for example, the M filters are clustered and a cluster-centered filter is selected, so that the M filters are reduced to L filters (L is a natural number of 1 or more) smaller than the M filters. An integration step for integration may be included.

Further, for example, in the integration step, the M filters may be clustered into L clusters determined in advance using a k-means method.

Further, for example, in the integration step, the M filters may be clustered using an affinity affinity propagation method.

Further, for example, the transformation may include rotational transformation at a randomly determined angle, and the division step may add a filter that has undergone the rotational transformation to at least one of the N filters. .

In addition, for example, the conversion includes addition of Gaussian noise having a standard deviation that is randomly determined, and in the division step, a filter having the Gaussian noise added is added to at least one of the N filters. You may do that.

In addition, for example, the conversion includes a contrast conversion that converts the contrast ratio to be determined at random, and in the division step, a filter that has been subjected to the contrast conversion on at least one of the N filters. It may be added.

In addition, for example, the conversion includes a scale conversion for converting to a scale determined at random. In the division step, a filter that has undergone the scale conversion is added to at least one of the N filters. You may do that.

Each of the embodiments described below shows a specific example of the present disclosure. Numerical values, shapes, components, steps, order of steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the present disclosure. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements. In all the embodiments, the contents can be combined.

(Embodiment)
Below, the determination method of the determination apparatus 10 in embodiment is demonstrated, referring drawings.

[Configuration of Determination Device 10]
FIG. 1 is a block diagram illustrating an example of the configuration of the determination apparatus 10 according to the present embodiment. FIG. 2A and FIG. 2B are diagrams for explaining an outline of the identification processing of the convolutional neural network.

1 includes an acquisition unit 11, a division unit 12, an integration unit 13, and an output unit 15, and determines the structure of a convolutional neural network. The determination device 10 is realized by a computer or the like.

Here, general processing of a convolutional neural network (CNN) will be described below.

A convolutional neural network is often used in the field of image recognition, and extracts a feature amount from an image by performing convolution with a filter on a two-dimensional image. As described above, the convolutional neural network is composed of a multilayer network that repeats convolution and pooling. In the convolutional neural network, the coefficient of the filter effective for identification constituting the convolutional layer is learned using a large amount of data such as a large amount of learning images (learning image group). The coefficient can be obtained by performing learning to acquire invariance to various deformations by repeating convolution using a filter and pooling that collects reactions in a certain region using a large amount of data. It has been found that the discrimination performance of the convolutional neural network depends on the filters constituting the convolutional layer.

In the example shown in FIG. 2A and FIG. 2B, a convolutional neural network composed of a two-layer network is shown in which a learning image group is used as a large amount of data and a filter coefficient effective for image identification is learned. Yes. And the process which identifies the numerical image which shows 9 to such a convolution neural network is shown. In FIGS. 2A and 2B, a ramp function (ReLU) is used as an activation function after convolution by a filter.

In the determination apparatus 10 according to the present embodiment, the filters constituting the convolution layer of the convolutional neural network are determined as the structure of the convolutional neural network. In the case where there are a plurality of convolution layers, a filter constituting at least one convolution layer is determined. Of course, you may determine the filter which comprises all the convolution layers. Thereby, the determination apparatus 10 can determine the convolution neural network comprised by the convolution layer which has the determined filter.

[Acquisition unit 11]
The acquisition unit 11 acquires a plurality of filters as initial values or acquires a learning image.

More specifically, the acquisition unit 11 acquires N (N is a natural number of 1 or more) filters whose weights have been learned using the learning image group as initial values. Note that the acquisition unit 11 may acquire a plurality of filters divided by the dividing unit 12 and a plurality of filters integrated by the integration unit 13 as initial values.

Also, the acquisition unit 11 acquires a learning image group. Here, the learning image group is a data set of a plurality of images prepared in advance, such as an MNIST data set and a GTSRB data set.

[Division unit 12]
FIG. 3 is a diagram for explaining an overview of the division process performed by the determination apparatus 10 illustrated in FIG. 1. FIG. 4 is a diagram illustrating an example of the dividing process performed by the dividing unit 12 illustrated in FIG. Note that the plurality of filters illustrated in FIG. 3A correspond to the plurality of filters constituting one of the two convolution layers illustrated in FIG. 2B. Further, the pre-division filter shown in FIG. 4A corresponds to a plurality of filters shown in FIG.

The dividing unit 12 performs a dividing process on a plurality of filters acquired as initial values, such as N filters acquired by the acquiring unit 11. For example, the dividing unit 12 performs a dividing process on a plurality of filters (32 in the figure) as shown in FIG. 3A, and the number of filters (in the figure shown in FIG. 3B). 96).

More specifically, the dividing unit 12 adds N filters that have undergone conversion used in the image processing field to at least one of the N filters as an initial value, thereby adding N filters. A division process for increasing the number of filters to larger M (M is a natural number of 2 or more) is performed.

Here, when the identification performance of the M filters is higher than the identification performance of the N filters, the dividing unit 12 further performs at least one of the M filters in the image processing field. It is also possible to perform division processing for increasing the M filters to P filters larger than M (P is a natural number of 3 or more) by adding filters to which conversion is used. Further, such division processing may be repeated up to a specified number of times that is a predetermined number of times by a user using the determination device 10 or the like. The number of filters after the increase may be determined by the user who uses the determination device 10.

In addition, when the identification performance of the M filters is equal to or lower than the identification performance of the N filters, the division unit 12 may perform the division process on the N filters again.

In the above, the identification performance of a plurality of filters means the identification performance of a convolutional neural network having the plurality of filters. The same applies to the following.

In the present embodiment, as shown in FIG. 1, the dividing unit 12 includes a random converting unit 121, a filter adding unit 122, and an identification performance evaluating unit 123.

The random conversion unit 121 performs conversion used in the image processing field on at least one of the plurality of filters acquired by the acquisition unit 11 as an initial value. The filter adding unit 122 adds a filter subjected to conversion used in the image processing field by the random conversion unit 121 to a plurality of filters acquired by the acquisition unit 11 as initial values and stored in a memory (not illustrated). .

Here, the conversion performed by the random conversion unit 121 may be performed by selecting from image conversions (conversion sets) known in the image processing field. For example, when the conversion performed by the random conversion unit 121 is a rotation conversion at a randomly determined angle, the random conversion unit 121 may perform the rotation conversion on at least one of the N filters. And the filter addition part 122 should just add the filter which the said random conversion part 121 performed the said rotation conversion.

Further, for example, when the conversion performed by the random conversion unit 121 is addition of Gaussian noise with a standard deviation determined at random, the random conversion unit 121 applies the Gaussian noise to at least one of the N filters. Can be given. And the filter addition part 122 should just add the filter to which the random conversion part 121 gave the said Gaussian noise.

For example, when the conversion performed by the random conversion unit 121 includes a contrast conversion that converts the contrast ratio to be determined at random, the random conversion unit 121 applies to at least one of the N filters. What is necessary is just to perform the said contrast conversion. And the filter addition part 122 should just add the filter which the said random conversion part 121 performed the said contrast conversion.

Further, for example, when the conversion performed by the random conversion unit 121 is a scale conversion that performs conversion so as to have a randomly determined scale, the random conversion unit 121 applies the change to at least one of the N filters. Scale conversion should be applied. And the filter addition part 122 should just add the filter which the said random conversion part 121 performed the said scale conversion.

Note that the conversion is rotation conversion at a randomly determined angle, provision of Gaussian noise with a randomly determined standard deviation, contrast conversion for converting to a randomly determined contrast ratio, or randomly The present invention is not limited to the case where scale conversion is performed so that the scale is determined. For example, contrast inversion conversion, isometric conversion, or the like may be included, and two or more combinations of these (conversion sets) may be included. If you select a rotation transformation at a randomly determined angle (random rotation transformation) and a randomly determined standard deviation Gaussian noise addition (random Gaussian noise addition) from the transformation set, the convolutional neural network A consistent improvement in identification performance can be expected. Hereinafter, an example of this case will be described with reference to FIG.

The pre-division filter shown in (a) of FIG. 4 is a plurality of filters that are initial values acquired by the acquisition unit 11, and the filter shown in (b) of FIG. 4 is one of the pre-division filters. It is a filter. As shown in FIG. 4C, the random conversion unit 121 performs the rotation conversion (described as random rotation conversion in the drawing) and the Gaussian noise of the filter shown in FIG. Giving (denoting the addition of random Gaussian noise) and generating a rotation filter and a blurred filter. As shown in FIG. 4D, the filter adding unit 122 temporarily adds the rotation filter and the blurred filter generated by the random conversion unit 121 to a plurality of filters that are initial values. The identification performance evaluation unit 123 described later evaluates the identification performance of a filter in which a rotation filter and a blurred filter are added to a plurality of filters that are initial values. Then, when the identification performance of the plurality of filters that are the initial values is higher, as shown in FIG. 4E, a filter in which a rotation filter and a blurred filter are added to the plurality of filters that are the initial values Is used as a post-division filter, and the division processing of the division unit 12 is terminated. The post-division filter shown in (e) of FIG. 4 corresponds to the filter shown in (b) of FIG.

The identification performance evaluation unit 123 uses the learning image group to cause the filter increased by the added filter to learn the weight and evaluate the identification performance of the increased filter. More specifically, the identification performance evaluation unit 123 uses the learning image group to cause the filter of the convolutional neural network having the filter increased by the added filter in the convolution layer to learn the weight, and to increase the filter Evaluate the discrimination performance of

The discriminating performance evaluation unit 123 divides the increased filter after dividing the increased filter when the discriminating performance of the increased filter evaluated is higher than the discriminating performance of the plurality of filters acquired by the acquisition unit 11 that is the initial value. Adopt as. In addition, the identification performance evaluation unit 123 causes the random conversion unit 121 to perform initial initialization when the evaluated identification performance of the increased filter is equal to or less than the identification performance of the plurality of filters acquired by the acquisition unit 11 that is the initial value. The division process is performed again on the plurality of value filters.

More specifically, the discrimination performance evaluation unit 123 uses the learning image group to learn the weights of, for example, M filters that have increased from N to M, thereby improving the discrimination performance of the M filters. evaluate. The discrimination performance evaluation unit 123 employs the M filters as post-division filters when the discrimination performance of the evaluated M filters is higher than the discrimination performance of the N filters that are the initial values. On the other hand, when the discriminating performance of the M filters evaluated is equal to or lower than the discriminating performance of N filters that are initial values, the discriminating performance evaluation unit 123 causes the random conversion unit 121 to receive The filter is divided again.

In the present embodiment, the dividing unit 12 has been described as performing the dividing process using the plurality of filters acquired by the acquiring unit 11 as initial values, but the present invention is not limited to this. The division process may be performed again using the divided filter after the division process as an initial value, or the division process may be performed using the integrated filter output by the integration unit 13 as an initial value.

Further, when the division process is performed a plurality of times, the identification performance evaluation unit 123 calculates the filter identification performance increased by the division process and the filter identification performance increased by the previous division process instead of the initial value. Compare.

[Integration unit 13]
FIG. 5 is a diagram for explaining an overview of the integration process performed by the determination apparatus 10 illustrated in FIG. 1. FIG. 6 is a diagram illustrating an example of integration processing performed by the integration unit 13 illustrated in FIG. A plurality of filters (pre-integration filters) shown in FIG. 6A correspond to a plurality of filters shown in FIG. 5A, and a plurality of filters shown in FIG. The post-integration filter) corresponds to a plurality of filters shown in FIG.

The integration unit 13 performs integration processing on a plurality of filters acquired as initial values, such as the N filters acquired by the acquisition unit 11 and the filters after the division processing. For example, the integration unit 13 performs integration processing on a plurality of filters (156 in the figure) as shown in FIG. 5A, and the number of filters (in the figure, FIG. 5B). To 32).

More specifically, the integration unit 13 performs an integration process to reduce the number of the plurality of filters by clustering the divided filters that have been divided by the dividing unit 12 and selecting a cluster-centered filter. This is because overlearning can be prevented, and the recognition performance can be improved, for example, by reducing the error rate during identification and performing image recognition with higher accuracy. Note that the plurality of filters that the integration unit 13 performs the integration process is not limited to the post-division filter that has been subjected to the division process by the division unit 12, but may be a plurality of filters that are acquired by the acquisition unit 11 as initial values.

In the present embodiment, as shown in FIG. 1, the integration unit 13 includes a clustering unit 131 and a filter selection unit 132.

The clustering unit 131 clusters the M filters that are the post-division filters subjected to the division processing by the division unit 12. As a result, the clustering unit 131 clusters the M filters into L clusters.

Here, the clustering unit 131 may cluster the M filters into L clusters that are determined in advance using the k-means method, or the result of clustering the M filters using the affinity-propagation method. , It may be clustered into L clusters. The k-means method uses a cluster average as a data distribution and classifies it into a given number K of clusters. On the other hand, the affinity-propagation method is a clustering method recently proposed by Frey et al., And it is not necessary to determine the number of clusters in advance, and the algorithm automatically determines the number of clusters. In addition, Affinityationpropagation method is a method of alternately updating and converging responsibility and availability, so there is no dependency on initial value, and clustering accuracy is better than existing clustering methods represented by k-means method etc. . Note that the clustering method using the k-means method or the affinity-propagation method is an existing clustering method, and thus detailed description thereof is omitted here.

The filter selection unit 132 selects a cluster-centered filter from the M filters clustered into L clusters by the clustering unit 131 and stored in a memory (not shown). Here, for example, the filter selection unit 132 calculates the vector centroids of a plurality of filters belonging to each of the L clusters, and selects the filter closest to the vector centroid, thereby becoming the cluster centers of the L clusters. Select a filter. In this way, the integration unit 13 integrates the M filters that are the post-division filters subjected to the division process in the division unit 12 into L filters (L is a natural number of 1 or more) smaller than the M filters.

Hereinafter, an example in which the clustering unit 131 performs clustering using the k-means method will be described with reference to FIG. The pre-integration filter shown in (a) of FIG. 6 is the division filter shown in (e) of FIG. 4 and is a post-division filter that has been subjected to division processing by the division unit 12. FIG. 6B shows an example in which clustering is performed by determining the boundary line from the data distribution so that the predetermined number of clusters is obtained using the k-means method.

As shown in FIG. 6B, the clustering unit 131 uses the k-means method for the pre-integration filter shown in FIG. Clustering is performed so that a predetermined number of clusters is determined in advance. Then, as shown in FIG. 6C, the filter selection unit 132 selects a filter (denoted as filter a in the figure) that is closest to the cluster center of each of a predetermined number of clusters, and serves as an integrated filter. adopt.

Note that the clustering unit 131 may cluster the N filters acquired by the acquisition unit 11 as initial values. In this case, the filter selection unit 132 selects a cluster-centered filter for each cluster among N filters that are clustered by the clustering unit 131 and stored in a memory (not shown) or the like. In this way, the integration unit 13 can integrate the N filters acquired by the acquisition unit 11 as initial values into a smaller number of filters.

Further, the integration unit 13 may further include an identification performance evaluation unit that uses the learning image group to cause the post-integration filter to learn the weight and evaluate the identification performance of the post-integration filter. In this case, when the identification performance evaluated by the identification performance evaluation unit is equal to or lower than the identification performance of the pre-integration filter, the integration process is performed again. If clustering is performed using the k-means method, the integration unit 13 changes the predetermined number of clusters and performs integration processing again. If the clustering is performed using the Affinity propagation method, the integration unit 13 The integration process may be performed again by changing parameters in the algorithm such as diagonal elements of the matrix.

[Output unit 15]
The output unit 15 outputs the filter divided by the dividing unit 12 or the filter integrated by the integrating unit 13 as a filter constituting the convolutional neural network determined by the determining device 10. Note that the output unit 15 is not an essential component and may be a memory. In this case, the filter divided by the dividing unit 12 or the filter integrated by the integrating unit 13 is stored as a filter constituting the convolutional neural network determined by the determining device 10.

[Determination Processing of Determination Device 10]
Next, the determination process of the determination apparatus 10 configured as described above will be described with reference to the drawings.

FIG. 7 is a flowchart showing an example of the determination process in the present embodiment. FIG. 8 is a flowchart showing an example of detailed processing in step S20 shown in FIG. FIG. 9 is a flowchart showing an example of detailed processing in step S30 shown in FIG.

First, in step S10, the determination apparatus 10 performs an acquisition process.

More specifically, prior to step S10, the weights of a plurality of filters constituting one or more convolution layers constituting the convolutional neural network are learned using the learning image group (S9). The determination device 10 acquires, as an initial value, N (N is a natural number of 1 or more) filters constituting at least one convolution layer among a plurality of filters whose weights have been learned using the learning image group. (S11).

Next, in step S20, the determination apparatus 10 performs a division process.

More specifically, at least one of the initial N filters acquired in step S11 is subjected to conversion used in the image processing field (S21), and the filter subjected to conversion used in the image processing field is converted. Add to the initial N filters (S22). Thereby, the N filters having the initial value can be increased to M filters (M is a natural number of 2 or more) larger than N filters. Here, the conversion may be performed by selecting one of the above-described conversion sets, but since the details have been described above, description thereof is omitted here. Next, the determination apparatus 10 evaluates the identification performance of the M filters by using the learning image group to learn the weights of the M filters, and exceeds the identification performance of the initial N filters. Is determined (S23). In step S23, when the identification performance of the M filters is equal to or lower than the identification performance of the N filters (No in S23) and is equal to or less than a predetermined number of times (specified number) (No in S24), Returning to S21, the division process is performed again. On the other hand, when the identification performance of the M filters is higher than the identification performance of the N filters (Yes in S24), the division process is terminated.

When the division process is repeated up to a predetermined number of times (a prescribed number), in step S10, a post-division filter, which is M filters that have been divided, is acquired as the initial value filter, and step S20 is performed again. Good.

Next, in step S30, the determination apparatus 10 performs integration processing.

More specifically, the determination apparatus 10 clusters the M filters that have been divided in step S20, which is an initial value filter (S31). Here, the determination apparatus 10 clusters the M filters that have been divided in step S20 into L (L is a natural number greater than or equal to 1) clusters as a result. Next, the determination apparatus 10 selects a cluster-centered filter for each of the L clusters (S32). In this way, the determination apparatus 10 integrates M filters into L filters that are smaller than M.

Note that the dividing process in step S20 and the integrating process in step S30 may be performed independently, or the integrating process in step S30 may be performed first, followed by the dividing process in step S20. Further, as described above, the integration process of step S30 may be performed after the division process of step S20 is repeated up to a predetermined number of times (specified number of times).

[Effectiveness of decision processing]
Next, the effectiveness of the determination process of the determination device 10 described above will be described by giving an example.

(Example 1)
FIG. 10 is a diagram illustrating an example of details of the algorithm of the determination process of the determination device 10 according to the first embodiment.

In FIG. 10, the algorithm described by “// SPLIT” is an example of the above-described division processing algorithm, and the algorithm described by “// SPLIT” is an example of the above-described division processing algorithm. Further, “δ ₀ , δ ₁ , δ ₂ ” indicates the evaluation value of the discrimination performance, and “Kernel” indicates a filter that constitutes at least one convolution layer of the convolution neural network.

FIG. 11 is a diagram illustrating an example of a determination process performed by the determination apparatus 10 according to the first embodiment. That is, in this embodiment, an example in which the integration process is performed first and then the division process is performed is shown.

More specifically, the determination device 10 according to the first embodiment performs integration processing on 150 filters as initial values, reduces the number of filters to 32, and learns weights using a learning image group. In addition, the identification performance of 32 filters is evaluated. In the integration process of this embodiment, clustering is performed using the k-means method as shown in FIG.

Then, the determination device 10 according to the first embodiment performs a division process on the 32 filters whose weights are learned using the learning image group, increases the number of filters to 96, and uses the learning image group. After learning the weights, the identification performance of the 32 filters is evaluated. In the division processing of the present embodiment, as shown in FIG. 10, rotation conversion at an angle determined at random and addition of Gaussian noise with a standard deviation determined at random are performed.

12A to 12D are diagrams for explaining the effectiveness of the integration processing of the present embodiment.

FIG. 12A shows an example of a test image, and an image in which a sign displaying 120 is tilted by about 30 degrees is shown. Further, the test image shown in FIG. 12A is misclassified by a convolutional neural network having a filter as an initial value in which the weight is learned by the learning image.

FIG. 12B is a diagram showing the softmax probability of the test image shown in FIG. 12A. In FIG. 12B, the response value of the output of 43 classes of the convolutional neural network having the filter as the initial value is shown by the softmax probability. In a neural network that performs category identification, the maximum value of output probability is output as a recognition result. When the convolutional neural network having a filter as an initial value classifies (identifies) the test image (correct answer label = 7) shown in FIG. 12A, a large reaction value is output in category 15 and it can be seen that it is misclassified.

FIG. 12C is a diagram showing an example of the softmax probability of the test image shown in FIG. 12A classified by the convolutional neural network having the filter after the division processing shown in FIG. Thereby, when the convolutional neural network having the filter after the division processing classifies (identifies) the test image shown in FIG. 12A, the response value to the correct answer label is improved, and it is understood that the correct classification is performed without misclassification. .

FIG. 12D is a diagram showing an example of the softmax probability of the image shown in FIG. 12A classified by the convolutional neural network having 32 filters after the integration process shown in FIG. When the convolutional neural network having 32 filters after the integration process classifies (identifies) the test image shown in FIG. 12A, the response value in FIG. 12C is further improved, and it is correctly classified without misclassification. Recognize.

(Example 2)
Since the effectiveness of the division processing and the integration processing according to the present disclosure has been verified using a plurality of data sets including a learning image and a test image, the experimental result will be described as a second embodiment.

FIG. 13 is a diagram showing identification performance values when a plurality of data sets are used in the second embodiment. FIG. 13 shows the value of discrimination performance when using the MNIST (Mixed National Institute of Standards and Technology database) data set, GTSRB (German Traffic Sign Recognition Benchmark) and CIFAR-10 (Canadian Institute for Advanced Research) data set ( Reference value) is shown.

[MNIST]
FIG. 14 is a diagram illustrating an example of a model structure (MNIST model structure) using the MNIST data set. The MNIST data set is composed of 60,000 learning images and 10,000 test images of 28 × 28 size handwritten numerals. As shown in FIG. 14, the MNIST model structure is composed of a convolutional neural network composed of two layers of concatenation and two layers of convolution, and a pooling layer is provided after each convolution layer using the ReLU activation function. Have. FIG. 13 shows an error rate of 0.82% as an identification performance value (reference value) when a test image of the MNIST data set is identified by the MNIST model structure learned from the learning image of the MNIST data set. It is shown.

FIG. 15 is a diagram illustrating an error rate when the division process or the integration process of the present disclosure is performed on the MNIST model structure. Here, SPLIT [1] in FIG. 1 indicates that the first filter (ORIGINAL) is divided, and MERGE [4] is No.1. It shows that four filters SPLIT [1] are integrated. In FIG. 15, 100 filters (ORIGINAL) constituting the first convolution layer among the two convolution layers of the MNIST model structure are divided to increase to 200 filters, and the weight is re-applied in the learning image. An error rate of 0.58% when learning (SPLIT [1]) is shown. Further, the 200 filters (SPLIT [1]) subjected to the division processing are further integrated and reduced to 100 filters, and the error rate is 0 when the weight is re-learned with the learning image (MERGE [4]). .59% is shown.

On the other hand, as a comparative example, an error rate of 0.78% when 100 filters constituting the first convolution layer of the MNIST model structure are not subjected to division processing and 200 or 300 filters are learned from the initial state. Or 0.75%.

It can be seen that the error rate when the division process or the integration process of the present disclosure is performed on the MNIST model structure is improved by almost 30% with respect to the error rate of the comparative example or the MNIST model structure. Although the error rate is reduced by 0.01% by the integration processing after the division processing, the identification performance is almost maintained.

[GTSRB]
The GTSRB data set consists of 43 different classes of 39,209 learning images and 12,630 test images of standard German road signs. Note that the size of the image included in the GTSRB data set is not uniform from 15 × 15 pixels to 250 × 250 pixels, and if used as it is, the number of pixels included in one block at the time of learning fluctuates and affects recognition. To do. Therefore, in this embodiment, all the images of the GTSRB data set are resized to 48 × 48, and a preprocessing technique such as histogram smoothing or contrast normalization is applied. Hereinafter, the GTSRB data set to which the preprocessing technique is applied is referred to as a GTSRB data set.

The model structure (GTSRB1 model structure) using the GTSRB data set is a convolutional neural network composed of three convolution layers and two all connection layers. FIG. 13 shows an error rate of 2.44% as an identification performance value (reference value) when the test image of the GTSRB data set is identified in the GTSRB1 model structure learned from the learning image of the GTSRB data set. It is shown.

FIG. 16 is a diagram illustrating an error rate when the division processing or integration processing of the present disclosure is performed on the GTSRB1 model structure. Here, “N” of 4N in FIG. 16 indicates that the filter is divided by Gaussian noise, and “R” of 5R indicates that the filter is divided by rotational transformation. MERGE [No. ], SPLIT [No. ] Is represented in the same manner as described above. It can be seen that in all experiments where the split or integration process of the present disclosure was performed on the GTSRB1 model structure, a fairly good performance was achieved, or a performance equivalent to a considerably smaller model size was achieved.

FIG. 17 is a diagram illustrating an output value of an error function when the GTSRB1 model structure and the GTSRB1 model structure subjected to the division process or the integration process of the present disclosure are optimized. Here, GTSRB1_original and GTSRB1_merge were compared in the case of the same number of parameters. As shown in FIG. 17, when the GTSRB1 model structure is compared with the output value of the error function when the GTSRB1 model structure is learned (optimized) with the learning image of the GTSRB data set, the divided or integrated GTSRB1 model structure of the present disclosure is obtained. It can be seen that the output value of the error function when learning is performed with the learning image of the GTSRB data set is lower. That is, it can be seen that the structure of the convolutional neural network effective for image recognition can be easily determined by performing the dividing process or the integrating process of the present disclosure.

FIG. 18 is a diagram illustrating an example of a model structure (GTSRB-3DNN model structure) using the GTSRB data set.

The GTSRB-3DNN model structure is a convolutional neural network consisting of 3 convolutional layers and 2 fully connected layers, using different image sizes of 48x48 pixels, 38x48 pixels and 28x48 pixels as input. . Therefore, the GTSRB-3DNN model structure is a collective model structure as compared with the GTSRB-3DNN model structure, which is a simple model structure. FIG. 13 shows an error rate of 1.24 as a discrimination performance value (reference value) when a test image of the GTSRB data set is discriminated from the GTSRB-3DNN model structure learned from the learning image of the GTSRB data set. %It is shown.

FIG. 19 is a diagram illustrating an error rate when the division process or the integration process of the present disclosure is performed on the GTSRB-3DNN model structure. In all experiments where the partitioning or integration processing of the present disclosure was performed on the GTSRB-3DNN model structure, a fairly good performance was achieved, or a performance equivalent to a considerably small model size was achieved. I understand.

[CIFAR-10]
The CIFAR-10 data set consists of 50,000 learning images and 10,000 test images in 10 categories.

The model structure using the CIFAR-10 data set (CIFAR-10 model structure) utilized a convolutional neural network composed of three convolutional layers disclosed in Non-Patent Document 1. FIG. 13 shows an error as a discrimination performance value (reference value) when a CIFAR-10 model structure trained with a CIFAR-10 dataset learning image is used to identify a test image of the CIFAR-10 dataset. A rate of 10.4% is shown.

FIG. 20 is a diagram illustrating an error rate when the division processing or the integration processing according to the present disclosure is performed on the CIFAR-10 model structure.

As shown in FIG. 20, in all experiments in which the division processing or the integration processing of the present disclosure was performed on the filter (ORIGINAL) constituting the convolutional layer of the CIFAR-10 model structure, It can be seen that equivalent performance was achieved. In other words, it can be seen that it is effective to apply the dividing process or the integrating process of the present disclosure to the structure of a complicated and highly adjusted convolutional neural network as disclosed in Non-Patent Document 1.

(Example 3)
Since the effectiveness of the integration processing of the present disclosure has been verified from the viewpoint of identification calculation time, the experimental result will be described as a third embodiment.

FIG. 21 is a diagram showing a comparison of identification calculation times when the integration processing of the present disclosure is performed.

The first row shown in FIG. 21 shows the calculation time when 10 48 × 48 pixel images are identified using the GTSRB1 model structure (ORIGINAL) after learning with the learning image of the GTSRB data set. 8MS is shown. On the other hand, the second or third line shown in FIG. 21 is obtained by performing integration processing once or twice on the GTSRB1 model structure and learning with the learning image of the GTSRB data set (MERGE [1] or MERGE Using [2]), the calculation time of 14.1 MS or 12.6 MS when ten 48 × 48 pixel images are identified is shown.

In the fourth row shown in FIG. 21, ten 48 × 48 pixel images are identified using the GTSRB-3DNN model structure (ORIGINAL) after learning with the learning image of the GTSRB data set. A calculation time of 27.9 MS is shown. On the other hand, in the fifth line shown in FIG. 21, the integration processing of the present disclosure is performed on the GTSRB-3DNN model structure and is learned using the learning image of the GTSRB data set (MERGE [4]). A speed of 19.4 MS when 10 48 × 48 pixel images are identified is shown.

This shows that the identification calculation time has been improved in all experiments in which the integration processing of the present disclosure was performed.

[Effects]
As described above, according to the determination device 10 and the determination method thereof in the present embodiment, the structure of the convolutional neural network can be determined more simply (or automatically). More specifically, according to the determination device 10 and the determination method thereof in the present embodiment, the division processing and the integration processing are performed using, as an initial value, a filter constituting at least one convolutional layer of a convolutional neural network learned by deep learning. By repeating the above, the structure of the convolution neural network effective for image recognition can be determined easily or automatically.

Here, the division process is a process of converting effective filters to increase the number of filters that are likely to be effective for image recognition, and the integration process is a process of integrating redundant filters by clustering and leaving only effective filters. The conversion used in the division process may be selected from image conversions (conversion sets) known in the image processing field. Since consistent improvement can be expected, rotation conversion at a randomly determined angle and provision of Gaussian noise with a standard deviation determined at random may be selected and used for conversion. As a clustering method used in the integration process, a known clustering method such as a k-means method or an affinity-propagation method may be used.

Thereby, even a person other than an expert can use the determination device 10 and its determination method in the present embodiment to obtain a convolution neural network structure effective for image recognition and use it.

In addition, although the determination apparatus 10 etc. in this Embodiment demonstrated as performing a division | segmentation process and a determination process, you may perform at least one. Further, the order and the number of times of performing the division process and the determination process are not limited to the above-described example, and the user of the determination apparatus 10 may freely determine.

Further, when there are a plurality of convolution layers constituting the convolutional neural network as an initial value, the determination device 10 or the like in the present embodiment performs a division process on a plurality of filters constituting at least one convolution layer. At least one of the determination processes may be performed. In addition, after performing at least one of the division process and the determination process on the plurality of filters constituting the one convolution layer, the division process is performed on the plurality of filters constituting the convolution layer different from the one convolution layer. And at least one of the determination processes may be performed. That is, the determination device 10 or the like in the present embodiment may perform at least one of the division process and the determination process on a part or all of the filter of the convolutional neural network as the initial value.

As described above, the determination method according to the present disclosure has been described in the embodiment, but the subject or the device that performs each process is not particularly limited. It may be processed by a processor or the like (described below) embedded in a specific device located locally. Further, it may be processed by a cloud server or the like arranged at a location different from the local device.

Note that the present disclosure further includes the following cases.

(1) The above apparatus is specifically a computer system including a microprocessor, ROM, RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like. A computer program is stored in the RAM or hard disk unit. Each device achieves its functions by the microprocessor operating according to the computer program. Here, the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.

(2) A part or all of the constituent elements constituting the above-described apparatus may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. . A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

(3) A part or all of the constituent elements constituting the above-described device may be constituted by an IC card or a single module that can be attached to and detached from each device. The IC card or the module is a computer system including a microprocessor, a ROM, a RAM, and the like. The IC card or the module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.

(4) The present disclosure may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of the computer program.

(5) Further, the present disclosure provides a computer-readable recording medium such as a flexible disk, a hard disk, a CD-ROM, an MO, a DVD, a DVD-ROM, a DVD-RAM, a BD ( It may be recorded on a Blu-ray (registered trademark) Disc), a semiconductor memory, or the like. The digital signal may be recorded on these recording media.

In the present disclosure, the computer program or the digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.

Further, the present disclosure may be a computer system including a microprocessor and a memory, and the memory may store the computer program, and the microprocessor may operate according to the computer program.

In addition, the program or the digital signal is recorded on the recording medium and transferred, or the program or the digital signal is transferred via the network or the like and executed by another independent computer system. You may do that.

(6) The above embodiment and the above modifications may be combined.

The present disclosure can be used for a determination apparatus and a determination method for determining a structure of a convolutional neural network effective for recognition, and in particular, a convolutional neural that can be executed even in an embedded system that is effective for image recognition and has less computational power than a personal computer system. The present invention can be used for a determination apparatus and a determination method for determining a network structure.

DESCRIPTION OF SYMBOLS 10 Determination apparatus 11 Acquisition part 12 Dividing part 13 Integration part 15 Output part 121 Random conversion part 122 Filter addition part 123 Identification performance evaluation part 131 Clustering part 132 Filter selection part

Claims

A determination method for determining the structure of a convolutional neural network,
An acquisition step of acquiring, as an initial value, N (N is a natural number of 1 or more) filters whose weights have been learned using the learning image group;
By adding a filter subjected to transformation used in the image processing field to at least one of the N filters, the N filters can be changed to M larger than the N filters (M is a natural number of 2 or more). Including a split step to increase the filter,
Decision method.
In the dividing step,
A division evaluation step of evaluating the identification performance of the M filters by using the learning image group to learn the weights of the M filters;
If the discrimination performance evaluated in the division evaluation step is less than or equal to the discrimination performance of the N filters, the division step is performed again.
The determination method according to claim 1.
Further, an integration step of integrating the M filters into L (L is a natural number of 1 or more) smaller than the M filters by clustering the M filters and selecting a cluster-centered filter. including,
The determination method according to claim 1 or 2.
In the integration step,
Clustering the M filters into L clusters predetermined using a k-means method;
The determination method according to claim 3.
In the integration step,
Clustering the M filters using an affinity propagation method;
The determination method according to claim 3.
The transformation includes a rotational transformation at a randomly determined angle;
In the dividing step,
Adding a filter that has undergone the rotational transformation to at least one of the N filters;
The determination method according to any one of claims 1 to 5.
The transformation includes the application of Gaussian noise with a standard deviation determined at random;
In the dividing step,
Adding a filter to which the Gaussian noise is applied to at least one of the N filters;
The determination method according to any one of claims 1 to 6.
The conversion includes a contrast conversion for converting to a randomly determined contrast ratio,
In the dividing step,
Adding a filter having undergone the contrast conversion to at least one of the N filters;
The determination method according to any one of claims 1 to 7.
The transformation includes a scale transformation that transforms to a randomly determined scale;
In the dividing step,
Adding the scaled filter to at least one of the N filters;
The determination method according to any one of claims 1 to 8.
A determination method for determining the structure of a convolutional neural network,
An acquisition step of acquiring, as an initial value, M (M is a natural number of 2 or more) filters whose weights have been learned using the learning image group;
An integration step of clustering the M filters and selecting a cluster-centered filter to integrate the M filters into L filters smaller than the M (L is a natural number of 1 or more). ,
Decision method.
A program for causing a computer to determine the structure of a convolutional neural network,
An acquisition step of acquiring, as an initial value, N (N is a natural number of 1 or more) filters whose weights have been learned using the learning image group;
By adding a filter subjected to transformation used in the image processing field to at least one of the N filters, the N filters can be changed to M larger than the N filters (M is a natural number of 2 or more). Including a split step to increase the filter,
program.