WO2021014746A1

WO2021014746A1 - Information processing method, information processing device, and information processing program

Info

Publication number: WO2021014746A1
Application number: PCT/JP2020/020612
Authority: WO
Inventors: 井手　直紀; アンドリューシン; 顕生早川
Original assignee: ソニー株式会社
Priority date: 2019-07-23
Filing date: 2020-05-25
Publication date: 2021-01-28

Abstract

Provided are an information processing method, information processing device, and information processing program which make it possible to improve inference precision for a classification problem and reduce model parameters and the amount of computation. In the information processing method according to the present disclosure, a computer (information processing device 1) uses a neural network (3) to infer the category to which input data belongs. The neural network (3) computes a feature vector from the input data and, on the basis of the feature vector, computes a probability or score for the category to which the input data belongs, using a decoding computation corresponding to prescribed error-correction coding.

Description

Information processing methods, information processing devices, and information processing programs

This disclosure relates to information processing methods, information processing devices, and information processing programs.

There is a technique for estimating the category to which the data belongs from the data of the classification problem using a learning model constructed by machine learning such as supervised learning (see, for example, Patent Document 1). Machine learning that uses a neural network (deep neural network) as a learning model is often called deep learning. A neural network that handles a classification problem is composed of an arithmetic unit that calculates a feature vector from data and an arithmetic unit that calculates the probability that it belongs to each category of data from the feature vector.

JP-A-2009-228686

In the classification problem of machine learning, it is always an issue not to mistake the category to which the data belongs. Further, in deep learning, it is an issue to reduce the parameters and operations of the neural network.

Therefore, this disclosure proposes an information processing method, an information processing device, and an information processing program that can improve the estimation accuracy of the classification problem and reduce the parameters and the amount of calculation of the model.

According to the present disclosure, an information processing method is provided. In the information processing method, a computer uses a neural network to estimate the category to which the input data belongs. The neural network calculates a feature vector from the input data, and based on the feature vector, calculates the probability or score of the category to which the input data belongs by using a decoding operation corresponding to a predetermined error correction coding. To do.

It is explanatory drawing of the neural network for learning which concerns on 1st Embodiment of this disclosure. It is a figure which shows an example of the generator matrix which concerns on this disclosure. It is a figure which shows an example of the inspection matrix which concerns on this disclosure. It is a flowchart which shows an example of the learning algorithm which concerns on 1st Embodiment of this disclosure. It is explanatory drawing of the execution neural network which concerns on 1st Embodiment of this disclosure. It is an image diagram of the logarithmic region thumb product decoding which concerns on 1st Embodiment of this disclosure. It is an image diagram of the neural network which estimates a label from the classification problem which concerns on 1st Embodiment of this disclosure. It is explanatory drawing of BCJR decoding algorithm of the turbo code which concerns on this disclosure. It is explanatory drawing of the neural network for learning which concerns on 2nd Embodiment of this disclosure. It is explanatory drawing which shows the correspondence relationship between the execution procedure of the classification problem which concerns on this disclosure and the information communication procedure. It is explanatory drawing of the neural network for learning which concerns on 3rd Embodiment of this disclosure. It is a flowchart which shows an example of the learning algorithm which concerns on 3rd Embodiment of this disclosure. It is a flowchart which shows an example of the learning algorithm which concerns on 4th Embodiment of this disclosure. It is a schematic explanatory diagram of the structure of the information processing apparatus which concerns on this embodiment. It is a hardware block diagram which shows an example of the computer which realizes the function of the information processing apparatus which concerns on this embodiment.

The embodiments of the present disclosure will be described in detail below with reference to the drawings. In each of the following embodiments, the same parts are designated by the same reference numerals, so that duplicate description will be omitted.

[1. background]
[1-1. Deep learning]
In the world of machine learning, deep learning technology that learns neural network parameters from a large amount of data is sweeping. In deep learning, parameter learning is realized by repeating parameter updates using a parameter gradient (gradient method) so as to reduce the objective function called loss (loss function).

[1-2. Supervised learning]
Among machine learning, the technique of learning by pairing data and the correct answer to be derived from the data is called supervised learning. This "correct answer" information is called a "label" in machine learning. Labels are usually given to each piece of data by hand.

[1-3. Classification problem]
If the label to be derived from the data is the category (class) to which the data belongs, the label will be the category ID (discrete value). In particular, a classification problem with a plurality of classes is sometimes called a multi-class classification problem. Multiclass classification problems usually assume that the data is in one of a predetermined class.

In this case, one label indicating one of the classes is given to one data. In deep learning, the neural network calculates a vector (hereinafter, also referred to as a logit vector) representing classiness from the input data.

At the time of learning, the parameters are learned so as to reduce the error between the classification executed by the neural network and the correct answer. In addition, when executing classification, the class to be classified is determined based on the vector representing the class-likeness.

[1-4. Multi-label problem]
A classification problem in which multiple labels are assigned to one piece of data is called a multi-label problem. In this case, the data is allowed to be one or more of the predetermined classes. In the following, a vector having the same number of dimensions as the number of classes, where the part represented by the label is 1 and the other parts are 0 is referred to as a label vector.

In a normal multi-class problem, the label vector has 1 for only one component and 0 for the others. Such a vector is called a one-hot vector. On the other hand, in the multi-label problem, the label vector has 1 at a plurality of components. Such a vector is called a multi-hot vector.

[1-5. Unknown classification problem]
The data may not fit into any of the predetermined classes. In such a case, the label vector becomes a 0 vector. A class called a class that does not apply to any of them may be added again to make a multi-class classification problem.

[2. Problems to be solved]
[2-1. Improvement of classification accuracy]
In the classification problem, it is always a challenge not to mistake the category to which the data belongs. In a neural network, the classiness is expressed by continuous values. However, this value fluctuates sensitively to data noise, diversity, parameter deviation, etc., and an error occurs when the fluctuation becomes large.

Therefore, there is a need for a method that can robustly classify data against noise, diversity, and parameter deviations. That is, an object of the present disclosure is to provide an information processing method for reducing errors in a classification problem of machine learning.

[2-2. Model weight reduction]
In deep learning, it is always a challenge to reduce the parameters and operations of the neural networks that make up the model.

The neural network that estimates the category to which the data belongs from the data is a calculation unit that calculates the feature vector from the data and the calculation that calculates the degree of belonging / non-affiliation (hereinafter referred to as class score, probability) for each category from the feature vector. It consists of parts.

Of these, the process of calculating the class score from the feature vector has many parameters to be learned, and the number of operations is also large accordingly. Therefore, an object of the present disclosure is to reduce the parameters and operations of the process of calculating the class score from the feature vector in the deep learning neural network.

[2-3. Error correction]
The error correction code is known as a technique for accurately transmitting information in communication technology. In communication technology, information is binary-encoded between remote locations, further transmitted as a physical signal, and the "original information" is restored from the obtained signal. At this time, an error may occur in the transmitted information due to noise in the transmission of the physical signal. Error correction is one of the techniques for reducing the error of this information.

Here, the classification problem is considered to be a problem of restoring a class that is "original information" from "physical signals" such as image data and audio data obtained by converting "information" that represents a class. Then, since the classification error can be considered as an information error, it can be expected that the classification error can be reduced by using the error correction technology. Therefore, the present disclosure provides a method of reducing errors in a classification problem in machine learning by using an error correction technique in the communication field.

[2-4. ECOC (Error Correcting Output Code)]
ECOC is known as a precedent for using error correction technology in machine learning classification problems. This technique is a technique for solving a multi-class problem as a combination of two-class problems. In this technique, the classification results are made robust by combining the classification results of the Niclass classifier, which is a positive example of the combination of multiple classes.

However, the high-performance decoding method for error correction is not applicable because ECOC uses the Hamming distance for decoding using probability. Therefore, the present disclosure provides a method of reducing errors by using an error correction technique using probability in a classification problem of deep learning.

[2-5. Development framework for deep learning]
In the development of deep learning, it is common to use an application called a development framework that selects and combines function layer groups required for a neural network to be deep-learned and optimization solver groups to be used during learning.

In such a development framework, the stacking of layers of a neural net is programmed in a script language (for example, Python), or visually programmed via a graphical user interface (GUI). Such a deep learning framework does not yet provide a layer that utilizes error correction technology. Therefore, the present disclosure provides a function layer that realizes an error correction technique.

[3. Solution]
The present disclosure provides the following method as a solution to a problem of improving performance by using probabilistic error correction in deep learning.

[3-1. Learning Encoded Labels]
The first solution is a method of combining learning by the ECOC method with probability decoding as run-time error correction.

In the first solution, LDPC (Low Density Parity Check) coding or turbo coding is used as the coding method in order to use probability decoding. Implement as a coding layer so that it can be used as a deep learning layer.

Also, as the actual probability decoding, the maximum posteriori decoding, the BCJR (Bahl Cocke Jelinek Raviv) method, or the thumb product method is used. Furthermore, by preparing a function that realizes these processes as a deep learning layer that can be used in an execution network, it is possible to realize a consistent configuration as a deep learning neural network. This makes it possible to correct errors using probabilities and generate neural networks using frameworks.

[3-2. Error back propagation of error correction layer]
The second solution is a deep learning layer that allows "error backpropagation" so that the error correction layer by maximum a posteriori decoding, BCJR method, thumb product decoding, etc. in the first solution can be used in the learning network. Provided as. By using such a deep learning layer, the computer learns the neural network using error backpropagation.

By performing error back propagation through the error correction layer in this way, a coding method corresponding to the reverse processing of error correction is embedded in the parameters of the neural network. In order to realize the error correction layer as a function layer capable of error backpropagation, the error correction layer is realized only by a combination of functions capable of error back propagation. Then, this combination itself is put together to form an error correction layer again.

Further, this disclosure is solved by adding an "error correction layer" to the framework as a method of easily using the error correction technology in the deep learning framework. By preparing error correction in the framework in this way, the developer can enjoy the effect of improving the performance by error correction without having to configure the detailed configuration of the error correction technology.

[4. Embodiment]
[4-1. First Embodiment]
[4-1-1. Learning with encoded labels]
The first embodiment is an example in which a label is used as a vector encoded as a channel code. FIG. 1 is an explanatory diagram of a learning neural network according to the first embodiment of the present disclosure. In addition, "t" shown in FIG. 1 is a label of the classification problem. “X” is the data of the classification problem. "Loss" is the value of the loss function.

The configuration of the learning neural network shown in FIG. 1 is as follows. Label → Binary vectorization (encoded layer) → Parity vectorization (encoded layer) → Encoded label. Data → Feature extraction network (feature extraction layer) → Logit calculation network (fully connected layer) → Logit vector. Loss function (loss layer) → Cross entropy of logit vector and vector with parity.

First, the network (encoding layer) related to labels will be described. The first thing to do with labels is binary vectorization. Binary vectorization of labels can be considered, for example, one-hot vectorization, multi-hot vectorization, binary numbering, and the like.

When the classification problem is multi-class classification, binary vectorization can be considered as one-hot vectorization or binary numbering. For example, when the number of classes is 10 and the label is 3, the binary vectorization is performed as follows. One-hot vector: (0,0,0,1,0,0,0,0,0,0), binary vector: (0,0,1,1).

Also, in the case of the multi-label problem, multi-hot vectorization can be considered. For example, when the number of classes is 10 and the labels are 3 and 7, the binary vectorization is performed as follows. Multi-hot vector: (0,0,0,1,0,0,0,1,0,0). Further, in the case of a problem including an unknown class, the multi-hot vector is set to 0 vector. Multi-hot vector: (0,0,0,0,0,0,0,0,0,0).

Subsequently, the binary vector of the label is encoded with parity. Here, the parity is an error check code generated from the original information. Parity is usually a binary code sequence, like the original information. In the present embodiment, a low density parity check code is connected to a binary-encoded label, and this is used as a label vector with parity. Label vectors with parity are usually multi-hot vectors. As described above, in the present disclosure, the error correction code used for the error correction coding is a low density parity check code (hereinafter, referred to as “parity check code”).

As a method of connecting the parity check code to the binary vector of the label, for example, Hamming coding (LDPC coding) or turbo coding can be considered. As described above, in the present disclosure, a turbo code can be used as the error correction code used for the error correction coding. When Hamming coding or LDPC coding is used, a memory for storing a pair of a generator matrix and a check matrix is prepared.

In the generator matrix, the length of the original code is m, the length of the parity code is k, and the length of the code with parity is n = m + k. For example, m × as represented by the following equation (1). Use the n matrix G.

Here, Im is an m-dimensional identity matrix. Q is an m × k matrix that takes only 0 or 1.

When the length of the original code is 10 and the length of the parity code is 10, for example, a matrix as shown in FIG. 2 is used. As the inspection matrix paired with this generator matrix, a matrix represented by the following equation (2) can be considered.

The inspection matrix corresponding to the generator matrix shown in FIG. 2 is the matrix shown in FIG.

Once the generator matrix is determined, the parity-signed vector c (channel code) can be generated from the binary vector m (message) by the following equation (3).

Here, ^{m T} G is a matrix product operation, mod2 represents the remainder obtained by dividing by two. In the first embodiment of the present disclosure, the vector with parity thus generated is used as the label vector with parity. Although the assignment of parity by the Hamming code or LDPC code has been described here, the parity may be added by the convolutional code or the turbo code.

Next, the processing related to data will be explained. As the feature extraction network (feature extraction layer) for calculating the feature amount from the data, the same one as the feature extraction vector calculation network of a normal neural network is used.

For example, if the input data is image or voice, a neural network that combines a convolution layer and a nonlinear active layer can be considered. If the input data is a symbol string such as a language, a neural network that combines an embed layer or the like can be considered.

For the neural network that calculates the logit vector (logarithmic odds vector) from the features, for example, a fully connected network (fully connected layer) is used. Here, in a normal neural network for classification problems, the number of dimensions of the input vector is the number of dimensions of the feature vector, and the number of dimensions of the output vector is the number of classes in the fully connected network.

However, in the present disclosure, when the label is a one-hot vector or a multi-hot vector, the number of dimensions of the output vector of the fully coupled network is the number of dimensions obtained by adding the number of parity to the number of classes. When the label is a binary vector, the number of dimensions of the output vector of the fully connected network is the dimension obtained by adding the number of parity to the dimension of the binary representation of the number of classes. As described above, in the present disclosure, the feature vector corresponds to the code length of the error correction code used for the error correction coding (message length (original code length) + parity length (parity code length)). It is a vector with a number of dimensions.

It is not necessary to clearly distinguish between the feature amount calculation network and the logit calculation network. Here, it suffices if a neural network that can calculate the logit vector from the input data is built. If this process is simplified and written, it can be written as the following equation (4).

Here, h represents a logit vector. x represents the input data. fθ represents the processing of the neural network. θ represents the parameters of the neural network.

Finally, the loss function will be described. Since the loss function is a classification problem, we use the standard cross entropy. Here, since the encoded vector is a binary vector, the binary cross entropy is calculated. Actually, the value obtained by averaging the samples for each sample represented by the following equation (5) is used as the loss.

Here, σ is a sigmoid function.

In the learning neural network, learning parameters are embedded in the feature extraction network. When training is performed to minimize the loss function, the training parameters are updated to the trained parameters.

Learning using encoded labels is performed by the gradient method using a mini-batch or its derivative algorithm, Adam. FIG. 4 is a flowchart showing an example of the learning algorithm according to the first embodiment of the present disclosure.

As shown in FIG. 4, in the learning algorithm according to the first embodiment, first, a learning network is configured (step S101). Subsequently, the label and the data are sampled from the training data set to generate a mini-batch (step S102).

After that, for each mini-batch, the loss is obtained, the gradient regarding the parameters of the loss function is obtained (step S103), and the parameters are updated according to the gradient (step S104). Subsequently, it is determined whether or not the predetermined convergence condition is satisfied (step S105).

Then, when the convergence condition is not satisfied (steps S105, No), the process is moved to step S102, the processing of steps S102 to S105 is repeated until the convergence condition is satisfied, and when the convergence condition is satisfied (steps S105, Yes). ), End the process.

Although the case where the processes of steps S102 to S105 are repeated until the convergence condition is satisfied has been described here, the processes of steps S102 to S105 may be repeated up to a predetermined maximum number of repetitions.

As described above, in the first embodiment, the computer executes the learning algorithm shown in FIG. 4 by using the learning neural network shown in FIG. Specifically, the computer converts the label of the classification problem for learning into a binary vector, adds a parity check code to the binary vector, and executes a process of calculating the label vector with parity.

In addition, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning. Subsequently, the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity.

Then, the computer learns so that the value of the loss function between the logit vector with parity and the label vector with parity is minimized, and executes a process of updating the parameters of the neural network used for label estimation.

As described above, in the first embodiment, the label and classification problem data used for learning are vectorized, and a parity check code is further added to machine-learn the neural network parameters. Thereby, in the first embodiment, it is possible to construct a neural network capable of robust classification with respect to data noise, diversity, and parameter deviation.

[4-1-2. Network for execution]
FIG. 5 is an explanatory diagram of an execution neural network according to the first embodiment of the present disclosure. In addition, "x" shown in FIG. 5 is the data of the classification problem. “R” is the estimation result of the label of the classification problem.

As shown in FIG. 5, the configuration of the execution neural network is as follows. Data → Feature extraction network (feature extraction layer) → Logit calculation network with parity (fully coupled layer) → Error correction network (error correction layer) → Logit vector.

Here, the feature extraction network has the same configuration as the feature extraction network of the learning neural network. The logit calculation network with parity has the same configuration as the logit calculation network in the learning neural network. As the parameters of these networks, the parameters learned in the learning network are used.

In the execution neural network, an error correction network (error correction layer) is newly prepared. An error correction network is a network that calculates a logit vector without parity from a logit vector with parity. This process is a process of calculating the logarithmic odds related to the original code from the logarithmic odds generated from the communication path signal in the error correction network.

As the decoding operation processing corresponding to this error correction coding, posterior distribution maximization decoding (maximum likelihood decoding algorithm), thumb product decoding (sum product algorithm), BCJR decoding (BCJR algorithm), and the like can be considered. Among these, the thumb product decoding includes, for example, a probability region thumb product decoding and a logarithmic region thumb product decoding.

Logarithmic region thumb-product decoding calculates log odds based on the input channel code and repeats variable node processing and inspection node processing to obtain the log odds of each bit of the "original" parity-coded string. It is an estimation algorithm.

FIG. 6 is an image diagram of log-space thumb product decoding according to the first embodiment of the present disclosure. FIG. 7 is an image diagram of a neural network that estimates labels from the classification problem according to the first embodiment of the present disclosure.

In FIGS. 6 and 7, the variable node is shown as a rectangle and the inspection node is shown as a circle. Further, the straight line group connecting the variable node and the inspection node shown in FIGS. 6 and 7 shows a parity inspection matrix.

The variable node performs the variable node processing described later on the input data and outputs it to the inspection node in the subsequent stage. The inspection node performs the inspection node processing described later using the inspection matrix on the input data and outputs it to the variable node in the subsequent stage.

In the case of the Gaussian channel, the log odds are calculated by the following equation (6) using the noise intensity and the signal intensity of the Gaussian channel.

Here, x _j is the intensity of the j-th signal, and n is the intensity of noise. If you do not know the noise intensity, set a value around 1.

The variable node processing updates the logarithmic odds based on the following equation (7).

However, σ _i has an initial value of 0.

The check node process performs a parity check based on the following equation (8).

If this process is repeated an appropriate number of times, the log odds rj of the posterior probability of the original parity code is calculated. The log odds vector excluding the parity part corresponds to the log odds of the original code string.

In this disclosure, this log-space thumb product method is used as an error correction layer. The input vector x of this layer is the logarithmic odds vector h with parity obtained by the equation (4). The output is the posterior probability logarithmic odds vector r obtained after repeating the equations (7) and (8). The output may be a vector obtained by cutting out the number of original classes from this vector.

Further, in the equation (6), since the logarithmic odds depend on the noise intensity, the variance and the like may be calculated for each code at the time of learning, and the noise intensity may be corrected from the variance. ..

From the logarithmic odds vector obtained as described above, the probability of the class and the classification result can be estimated. For example, in the case of multi-label classification or multi-class classification where unknown classes can occur, the probability of the class is determined by passing through the sigmoid function. In addition, the label can be estimated by discriminating the logarithmic odds and the probability for each class with an appropriate threshold value.

The method of estimating the probability and label for each class as described above can be obtained by directly calculating the posterior probability, MAP (Maximum A Posteriori probability) decoding using the calculation result, or BCJR decoding algorithm. realizable.

FIG. 8 is an explanatory diagram of the turbo code BCJR decoding algorithm according to the present disclosure. As shown in FIG. 8, in the BCJR decoding algorithm, for example, data interleaving, convolutional code propagation, and logarithmic odds calculation are sequentially repeated.

As described above, in the first embodiment, the computer estimates the label from the data of the classification problem by using the execution neural network shown in FIG. Specifically, the computer executes a process of extracting the feature amount of the data from the data of the classification problem.

Subsequently, the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity. After that, the computer executes a process of performing error correction of the logit vector with parity based on the parity check code.

Then, the computer executes a process of estimating the label of the classification problem based on the logit vector obtained by excluding the parity check code from the logit vector with parity after error correction.

As described above, in the first embodiment, the data of the classification problem is vectorized, a parity check code is added, and error correction is performed based on the parity check code to estimate the label. Therefore, the noise, diversity, and parameters of the data It is possible to perform a robust classification for deviations. Therefore, according to the first embodiment, it is possible to improve the estimation accuracy of the label of the classification problem.

[4-2. Second Embodiment]
The second embodiment is an example of using the execution neural network of the first embodiment as a learning neural network. FIG. 9 is an explanatory diagram of the learning neural network according to the second embodiment of the present disclosure.

Note that "x" shown in FIG. 9 is the data of the classification problem. "T" is the label of the classification problem. “H” is a feature vector extracted from the data of the classification problem. “R” is the class score of the label after error correction.

The configuration of the learning neural network shown in FIG. 9 is as follows. Data → Feature extraction network (feature extraction layer) → Logit calculation network with parity (feature extraction layer) → Error correction network (error correction layer) → Logit vector. Label → Binary vectorization (only for multi-label problems and problems with unknown class data) Loss function (loss layer) → Cross entropy between logit vector and label.

In the second embodiment, the feature amount is extracted and the parity check code is added in the feature extraction layer. Therefore, a part of the feature vector input to the error correction layer corresponds to the original code, and a part corresponds to the inspection code (parity). As a result, the error correction layer can perform error correction of the original code by, for example, thumb product decoding and BCJR decoding.

The difference from the first embodiment is that if the classification problem is a problem other than the multi-label problem or the problem with unknown class data, the parity coding of the label is not performed. For the loss, the sigmoid cross entropy of Eq. (5) is used in the multi-label problem and the unknown class problem.

In the case of a multi-class problem, the softmax cross entropy calculated by the following equation (9) is used.

Here, r is a vector obtained by cutting out only the number of classes of the posterior probability logarithmic odds vector r obtained after repeating the equations (7) and (8). Further, t is a label that has not been converted into a binary vector.

In the case of this configuration, the parity coding of the label is not performed, but the parity coding information is a process of performing the coding equivalent by transmitting the gradient to the parameter by the error back propagation of the equations (7) and (8). Is embedded in the parameter. Equations (7) and (8) are both functions capable of error backpropagation of the gradient, that is, a function in which error back propagation is defined. As described above, the decoding operation corresponding to the error correction coding is composed of a combination of operations capable of error back propagation. As the execution neural network, the one having the same configuration as the execution neural network of the first embodiment may be used.

As described above, in the second embodiment, the computer learns the parameters of the learning neural network using the learning neural network shown in FIG. Specifically, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning.

Subsequently, the computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector to calculate the logit vector with parity, and corrects the error of the logit vector with parity based on the parity check code. To execute.

Then, the computer updates the parameters of the neural network used for estimating the label by learning so that the value of the loss function between the logit vector with parity after error correction and the label of the classification problem for learning is minimized. To execute.

As a result, according to the second embodiment, it is possible to construct a neural network capable of robust classification for data noise, diversity, and parameter deviation without performing binary vectorization of labels used for learning. Can be done.

Here, with reference to FIG. 10, the correspondence between the execution procedure of the classification problem according to the present disclosure and the information communication procedure will be described. FIG. 10 is an explanatory diagram showing a correspondence relationship between the execution procedure of the classification problem according to the present disclosure and the information communication procedure.

As shown in FIG. 10, the original code before transmission in information communication can be considered as a class in the classification problem. Then, the procedure of coding and modulating the original code in the information communication and transmitting it via the physical layer serving as the transmission path corresponds to the procedure of extracting the feature amount from the data of the classification problem by the feature amount extraction layer.

Here, in information communication, noise may be mixed in the received signal at the physical layer. In the classification problem, such a phenomenon corresponds to a phenomenon in which noise is mixed in the feature amount. In information communication, the received signal is demodulated and decoded to correct errors, and the code probability is calculated to eliminate the influence of noise.

Therefore, in the present disclosure, the feature amount extracted from the data of the classification problem is vectorized, a parity check code is added as in the case of information communication, and the error correction of the feature amount vector is performed by the parity layer which is the error correction layer described above. Go and calculate the class probability. As described above, in the information processing method according to the present disclosure, for example, a computer estimates the category to which the input data belongs by using a neural network. The neural network calculates a feature vector from the input data, and based on the feature vector, calculates the probability or score of the category to which the input data belongs by using a decoding operation corresponding to a predetermined error correction coding. Thereby, in the present disclosure, it is possible to improve the estimation accuracy of the label even when the data of the classification problem contains noise.

[4-3. Third Embodiment]
The third embodiment describes a method of realizing unsupervised learning in combination with the first embodiment and the second embodiment. Since the execution network is the same as that of the first embodiment and the second embodiment, the description thereof is omitted here. The learning network is a combination of the first embodiment and the second embodiment as follows.

FIG. 11 is an explanatory diagram of the learning neural network according to the third embodiment of the present disclosure. In addition, "x" shown in FIG. 11 is the data of the classification problem. "Loss" is the value of the loss function.

The configuration of the learning network shown in FIG. 11 is as follows. Data → Feature extraction network (feature extraction layer) → Logit calculation network with parity (fully coupled layer) → Error correction network (error correction layer) → Logit vector (encoding layer) → Prediction label.

Predictive label → binary vectorization (encoding layer) → label vectorization with parity (encoding layer). Loss function (loss layer) → Cross entropy of logit vector with logit and label vector with parity.

Here, the correct label is not necessary, and instead the label estimated from the data through error correction is used. Then, a label vector with parity created based on this estimated label is used for loss calculation.

When calculating the prediction label in the error correction layer, it may be calculated as a prediction label with parity. In this case, there is no need to process the predicted label to the label vectorization with parity.

FIG. 12 is a flowchart showing an example of the learning algorithm according to the third embodiment of the present disclosure. As shown in FIG. 12, in the learning algorithm according to the third embodiment, first, a learning network is configured (step S201).

Subsequently, a mini-batch is sampled from the unlabeled data set (step S202). After that, the mini-batch is input to the learning network, and the logit vector with parity and the prediction label are estimated (step S203).

Subsequently, the label vector with parity is calculated from the predicted label (step S204). After that, the logit vector with parity and the label vector with parity are input to the loss function (step S205). Then, the error is back-propagated from the loss to each parameter to update the parameter (step S206).

Subsequently, it is determined whether or not the predetermined convergence condition is satisfied (step S207). Then, when the convergence condition is not satisfied (steps S207, No), the process is moved to step S202, the processing of steps S202 to S206 is repeated until the convergence condition is satisfied, and when the convergence condition is satisfied (steps S207, Yes). ), End the process.

Although the case where the processes of steps S202 to S206 are repeated until the convergence condition is satisfied has been described here, the processes of steps S202 to S206 may be repeated up to a predetermined maximum number of repetitions.

As described above, in the third embodiment, the computer executes the learning algorithm shown in FIG. 12 by using the learning neural network shown in FIG. Specifically, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning.

After that, the computer converts the logit vector with parity after error correction into a binary vector, adds a parity check code to the binary vector, and calculates the label vector with parity.

Then, the computer updates the parameters of the neural network used for estimating the label by learning so that the value of the loss function of the logit vector with parity and the label vector with parity after error correction is minimized.

Thereby, in the third embodiment, it is possible to perform unsupervised learning and to construct a neural network capable of robust classification against data noise, diversity, and parameter deviation.

[4-4. Fourth Embodiment]
In the fourth embodiment, semi-supervised learning is performed by combining supervised learning and unsupervised learning. Here, the supervised learning part is performed by the machine learning described in the first embodiment or the second embodiment, and the unsupervised learning is performed by the machine learning described in the third embodiment.

In this way, unsupervised learning (self-learning) in which a neural network that can be classified correctly by supervised learning is used to generate a label from the data and learn the neural network according to the label. ) Can be realized.

FIG. 13 is a flowchart showing an example of the learning algorithm according to the fourth embodiment of the present disclosure. As shown in FIG. 13, in the learning algorithm according to the fourth embodiment, first, the learning network A of the first or second embodiment is configured (step S301).

Subsequently, the learning network B of the third embodiment that shares the parameters with the learning network A is configured (step S302). The mini-batch is then sampled from the labeled dataset (step S303).

Subsequently, the mini-batch is input to the learning network A to update the parameters of the learning network (step S304). The mini-batch is then sampled from the unlabeled dataset (step S305).

Subsequently, the mini-batch is input to the learning network B to update the parameters of the learning network (step S306). Then, it is determined whether or not the predetermined convergence condition is satisfied (step S307).

Then, when the convergence condition is not satisfied (step S307, No), the process is moved to step S303, the processing of steps S303 to S306 is repeated until the convergence condition is satisfied, and when the convergence condition is satisfied (step S307, Yes). ), End the process.

Although the case where the processes of steps S303 to S306 are repeated until the convergence condition is satisfied has been described here, the processes of steps S303 to S306 may be repeated up to a predetermined maximum number of repetitions.

Thereby, in the fourth embodiment, it is possible to perform semi-supervised learning and to construct a neural network capable of robust classification against data noise, diversity, and parameter deviation.

[4-5. Fifth Embodiment]
According to the first embodiment and the second embodiment, it is possible to learn the multi-label problem. Multi-label learning enables weak supervised learning with multiple instances. As an application of weak supervised learning by multiple instances, for example, position estimation of a specified object on an image can be considered.

The learning network for object position estimation by normal multiple instance learning has the following configuration. Data → Feature vector calculation network → Map calculation network by class → Global max pooling by map → Logit vector by class. A multi-label (multi-hot vector) is used as the label. The loss function uses the sigmoid cross entropy of the class-specific logit vector and the multi-label vector.

Here, since the input is an image, the feature vector calculation network uses a network composed of convolution layers as described above. The convolution network is composed of a convolution process in which filters are swept vertically and horizontally in a map (RGB for input data) and a process in which weighted additions are performed between maps.

When this process is written by an equation, it can be written as the following equation (10).

Here, f _{k, u, and v} are input variables (number of input maps K × vertical × horizontal tensor), and hm _{, u, v} are output variables (number of output maps M × vertical × horizontal tensor). Further, W _{m, k, p, q} are filters having a filter size of P × Q, which is the number of input maps × the number of output maps.

The class-specific map calculation network is a network that uses this convolution layer to perform convolution with the same number of maps as the number of classes. Maps for each class are expected to be learned to represent the object-likeness of each corresponding object location.

This information is information indicating whether or not an object of the specified class exists and where it is. However, the multi-label information is only information on whether the object exists or not. For this reason, the information on where the object is located is not necessary for the information on whether the object exists or not, so this is removed.

Global Max Pooling is a process that takes the maximum value in the map for each map. Applying this to the output of a class-based map (object-likeness map), the score of the place where the object seems to be most is calculated.

And in learning, for the score of the place where the object seems to be most, if the label says that there is no object, the score will go down, and if the label says that there is an object, the score will go up. learn.

In this disclosure, the learning network will be changed as follows. Data → Feature vector calculation network → Map calculation network by class with parity → Global max pooling by map → Logit vector by class with parity → Error correction layer → Logit vector by class.

The label uses a multi-label (multi-hot vector). The loss function uses the sigmoid cross entropy of the class-specific logit vector and the multi-label vector.

Specifically, the computer extracts the features of the data from the data of the classification problem for learning with multiple labels, converts the features into a logit vector, adds a parity check code to the logit vector, and labels the data. Executes the process of calculating the map for each class with parity for each class to which it belongs.

After that, the computer performs global max pooling on the map for each class with parity to calculate the logit vector for each class with parity, corrects the error of the logit vector for each class with parity based on the parity check code, and corrects the error for the logit vector for each class. Executes the process of calculating.

Then, the computer executes a process of learning to minimize the value of the loss function between the error-corrected class-specific logit vector and the vectors of a plurality of labels and updating the parameters of the neural network used for label estimation. To do. By doing so, it can be expected that the information on whether or not the object exists becomes more robust and the object position estimation is realized with high accuracy. In this way, the computer can estimate the positions of a plurality of specified objects on the image with high accuracy by performing weak supervised learning by multiple instances.

[5. Deep Learning Framework]
The deep learning framework according to the present disclosure is aligned in a lineup so that a coding layer and an error correction layer can be added to a commonly used neural network for classification problems. As the coding layer, a humming layer, an LDPC layer, a turbo layer and the like can be considered.

The humming layer and LDPC layer should be able to input the generator matrix. In addition, the turbo layer enables input of the Viterbi encoder configuration and the interleave matrix.

As the error correction layer, a posterior probability layer, a BCJR layer, a thumb product decoding layer, and the like can be considered. Of these, the BCJR layer and the thumb product layer have iterative processing, so the maximum number of repetitions can be specified.

Also, the posterior probability layer should be able to input the generator matrix. The BCJR layer is capable of inputting the Viterbi encoder configuration and the interleave matrix. The thumb product layer allows the parity check matrix corresponding to the generator matrix to be input.

[6. Information processing device configuration]
FIG. 14 is a schematic explanatory view of the configuration of the information processing apparatus according to the present embodiment. The information processing device 1 shown in FIG. 14 is realized by, for example, a computer such as a GPU (Graphics Processing Unit) or a CPU (Central Processing Unit).

The information processing device 1 includes an information processing unit 2 that estimates the label of the classification problem using the neural network 3 whose parameters are machine-learned. The neural network 3 includes a feature extraction layer 31, an LDPC / turbo decoding unit 32, a loss layer 33, and an LDPC / turbo coding unit 34.

In the feature extraction layer, the data of the classification problem is input, and the feature amount of the data is extracted from the data of the classification problem. The LDPC / turbo decoding unit 32 converts the feature quantity into a logit vector, adds a parity check code to the logit vector to calculate the parity check code, and based on the parity check code, an error in the parity check vector. Functions as a layer for correction. The loss layer estimates the label of the classification problem based on the logit vector with the parity check code removed from the error-corrected logit vector with parity.

In the loss layer, for example, when the information processing unit 2 performs supervised learning of the second embodiment, the label of the classification problem is input from the outside. Further, the LDPC / turbo coding unit 34 functions as a coding layer shown in FIG. 1 by inputting a label of a classification problem from the outside when, for example, the information processing unit 2 performs supervised learning of the first embodiment. Then, the label is encoded and the like is performed.

Turbo code and LDPC code encoders and decoders are often already equipped with high-speed hardware for use in communication technology. For this reason, the LDPC / turbo decoding unit 32 and the LDPC / turbo coding unit 34 are not limited to the execution environment of deep learning, and for example, hardware such as ASIC (Application Specific Integrated Circuit) is adopted to directly calculate. You may.

[7. effect]
As described above, in the information processing method according to the present disclosure, the computer executes a process of extracting the feature amount of the data from the data of the classification problem. The computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity. The computer executes an error correction process of the logit vector with parity based on the parity check code. The computer performs a process of estimating the label of the classification problem based on the logit vector obtained by excluding the parity check code from the error-corrected logit vector with parity.

In this way, the computer vectorizes the data of the classification problem, adds a parity check code, corrects the error based on the parity check code, and estimates the label. Therefore, the data noise, diversity, and parameter deviation are dealt with. It is possible to perform a robust classification. Therefore, the computer can improve the estimation accuracy of the label of the classification problem.

Further, in the information processing method according to the present disclosure, the computer converts the label of the classification problem for learning into a binary vector, adds a parity check code to the binary vector, and executes a process of calculating the label vector with parity. The computer executes a process of extracting data features from the data of the classification problem for learning. The computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity. The computer performs a process of learning to minimize the value of the loss function between the logit vector with parity and the label vector with parity and updating the parameters of the neural network used for label estimation.

This allows the computer to build a neural network that can robustly classify data noise, diversity, and parameter deviations.

Further, in the information processing method according to the present disclosure, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning. The computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity. The computer executes an error correction process of the logit vector with parity based on the parity check code. The computer performs the process of learning to minimize the value of the loss function between the error-corrected logit vector with parity and the label of the classification problem for training and updating the parameters of the neural network used to estimate the label. To do.

As a result, the computer can construct a neural network that can perform robust classification against data noise, diversity, and parameter deviation without performing binary vectorization of labels used for learning.

Further, in the information processing method according to the present disclosure, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning. The computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating the logit vector with parity. The computer executes an error correction process of the logit vector with parity based on the parity check code. The computer converts the logit vector with parity after error correction into a binary vector, adds a parity check code to the binary vector, and executes a process of calculating a label vector with parity. The computer performs a process of learning to minimize the value of the loss function between the error-corrected logit vector with parity and the label vector with parity, and updating the parameters of the neural network used for label estimation.

As a result, the computer can perform unsupervised learning and can construct a neural network that can perform robust classification against data noise, diversity, and parameter deviation.

Further, in the information processing method according to the present disclosure, the computer updates the parameters of the neural network by the information processing method according to the first embodiment or the information processing method according to the second embodiment, and obtains the updated parameters. The process of updating according to the information processing method according to the second embodiment is executed.

As a result, the computer can perform semi-supervised learning and can construct a neural network that can perform robust classification against data noise, diversity, and parameter deviation.

Further, in the information processing method according to the present disclosure, the computer executes a process of extracting the feature amount of the data from the data of the classification problem for learning having a plurality of labels. The computer converts the feature quantity into a logit vector, adds a parity check code to the logit vector, and executes a process of calculating a class-based map with parity for each class to which the label belongs. The computer performs global max pooling on the class-based map with parity to calculate the logit vector for each class with parity. The computer executes a process of calculating an error of the class-specific logit vector with parity based on the parity check code and calculating the class-specific logit vector. The computer performs a process of learning to minimize the value of the loss function between the error-corrected class-based logit vector and the vectors of a plurality of labels and updating the parameters of the neural network used for label estimation.

As a result, the computer can estimate the positions of a plurality of specified objects on the image with high accuracy by performing weak supervised learning using multiple instances.

In the information processing method according to the present disclosure, the computer corrects errors by using a function in which error back propagation is defined. As a result, when performing supervised learning, the computer uses a function in which error back propagation is defined to perform error correction on the data of the classification problem, thereby making the neural network parameters the reverse of the error correction. It is possible to embed a coding method corresponding to the processing of.

Further, the information processing program according to the present disclosure causes a computer to execute a process of extracting data features from the data of the classification problem. A computer is made to perform a process of converting a feature quantity into a logit vector, adding a parity check code to the logit vector, and calculating a logit vector with parity. Have the computer execute a process of performing error correction of the logit vector with parity based on the parity check code. Have the computer perform a process of estimating the label of the classification problem based on the logit vector obtained by excluding the parity check code from the logit vector with parity after error correction.

Further, the information processing device 1 according to the present disclosure has an information processing unit 2 that estimates the label of the classification problem by using the machine-learned neural network 3. The neural network 3 includes a layer that extracts the feature amount of the data from the data of the classification problem, a layer that converts the feature amount into a logit vector, adds a parity check code to the logit vector, and calculates a logit vector with parity, and parity. It has a layer for error correction of a logit vector with parity based on an inspection code. The information processing unit estimates the label of the classification problem based on the logit vector obtained by excluding the parity check code from the logit vector with parity after error correction.

In this way, the information processing device vectorizes the data of the classification problem, adds a parity check code, corrects an error based on the parity check code, and estimates the label. Therefore, data noise, diversity, and parameter deviations. Can be categorized as robust. Therefore, the information processing device can improve the estimation accuracy of the label of the classification problem.

[8. Others]
Further, among the processes described in each of the above embodiments, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed. It is also possible to automatically perform all or part of the above by a known method. In addition, the processing procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

Further, each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically distributed / physically in any unit according to various loads and usage conditions. It can be integrated and configured.

Further, each of the above-described embodiments and modifications can be appropriately combined as long as the processing contents do not contradict each other.

Further, the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

[9. Hardware configuration]
The information device such as the information processing device 1 according to each of the above-described embodiments is realized by, for example, a computer 1000 having a configuration as shown in FIG. Hereinafter, the information processing device 1 according to the embodiment will be described as an example. FIG. 15 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of the information processing device 1. The computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input / output interface 1600. Each part of the computer 1000 is connected by a bus 1050.

The CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each part. For example, the CPU 1100 expands the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.

The ROM 1300 stores a boot program such as a BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, a program that depends on the hardware of the computer 1000, and the like.

The HDD 1400 is a computer-readable recording medium that non-temporarily records a program executed by the CPU 1100 and data used by the program. Specifically, the HDD 1400 is a recording medium for recording an information processing program according to the present disclosure, which is an example of program data 1450.

The communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet). For example, the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.

The input / output interface 1600 is an interface for connecting the input / output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or mouse via the input / output interface 1600. Further, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input / output interface 1600. Further, the input / output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium (media). The media is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Is.

For example, when the computer 1000 functions as the information processing device 1 according to the embodiment, the CPU 1100 of the computer 1000 realizes the functions of the information processing unit 2 and the like by executing the information processing program loaded on the RAM 1200. Further, the information processing program according to the present disclosure and the data in the storage unit 120 are stored in the HDD 1400. The CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program, but as another example, these programs may be acquired from another device via the external network 1550.

Note that the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

The present technology can also have the following configurations.
(1)
An information processing method in which a computer estimates the category to which input data belongs using a neural network.
The neural network calculates a feature vector from the input data and
Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
An information processing method that executes processing.
(2)
The feature vector is a vector having a number of dimensions corresponding to the code length of the error correction code used for the predetermined error correction coding.
The information processing method according to (1) above.
(3)
The decoding operation corresponding to the predetermined error correction coding is composed of a combination of operations capable of error back propagation.
The information processing method according to (1) or (2) above.
(4)
A computer learns the neural network using the backpropagation method.
The information processing method according to any one of (1) to (3) above.
(5)
The error correction code used for the predetermined error correction coding is a low density parity check code.
The information processing method according to any one of (1) to (4) above.
(6)
The error correction code used for the predetermined error correction coding is a turbo code.
The information processing method according to any one of (1) to (4) above.
(7)
The decoding operation corresponding to the predetermined error correction coding is the maximum likelihood decoding algorithm.
The information processing method according to any one of (1) to (6) above.
(8)
The decoding operation corresponding to the predetermined error correction coding is the BCJR algorithm.
The information processing method according to any one of (1) to (6) above.
(9)
The decoding operation corresponding to the predetermined error correction coding is a thumb product algorithm.
The information processing method according to any one of (1) to (6) above.
(10)
An information processing device that estimates the category to which input data belongs using a neural network.
The neural network calculates a feature vector from the input data and
Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
Information processing device.
(11)
The feature vector is a vector having a number of dimensions corresponding to the code length of the error correction code used for the predetermined error correction coding.
The information processing device according to (10) above.
(12)
The decoding operation corresponding to the predetermined error correction coding is composed of a combination of operations capable of error back propagation.
The information processing device according to (10) or (11).
(13)
The neural network is learned by using the error back propagation method.
The information processing device according to any one of (10) to (12).
(14)
The error correction code used for the predetermined error correction coding is a low density parity check code.
The information processing device according to any one of (10) to (13).
(15)
The error correction code used for the predetermined error correction coding is a turbo code.
The information processing device according to any one of (10) to (13).
(16)
The decoding operation corresponding to the predetermined error correction coding is the maximum likelihood decoding algorithm.
The information processing device according to any one of (10) to (15).
(17)
The decoding operation corresponding to the predetermined error correction coding is the BCJR algorithm.
The information processing device according to any one of (10) to (15).
(18)
The decoding operation corresponding to the predetermined error correction coding is a thumb product algorithm.
The information processing device according to any one of (10) to (15).
(19)
An information processing program that causes a computer to execute a process of estimating the category to which input data belongs using a neural network.
The neural network calculates a feature vector from the input data and
Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
An information processing program that executes processing.

1 Information processing device 2 Information processing unit 3 Neural network 31 Feature extraction layer 32 LDPC / Turbo decoding unit 33 Loss layer 34 LDPC / Turbo coding unit

Claims

An information processing method in which a computer estimates the category to which input data belongs using a neural network.
The neural network calculates a feature vector from the input data and
Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
An information processing method that executes processing.
The feature vector is a vector having a number of dimensions corresponding to the code length of the error correction code used for the predetermined error correction coding.
The information processing method according to claim 1.
The decoding operation corresponding to the predetermined error correction coding is composed of a combination of operations capable of error back propagation.
The information processing method according to claim 1.
A computer learns the neural network using the backpropagation method.
The information processing method according to claim 1.
The error correction code used for the predetermined error correction coding is a low density parity check code.
The information processing method according to claim 1.
The error correction code used for the predetermined error correction coding is a turbo code.
The information processing method according to claim 1.
The decoding operation corresponding to the predetermined error correction coding is the maximum likelihood decoding algorithm.
The information processing method according to claim 1.
The decoding operation corresponding to the predetermined error correction coding is the BCJR algorithm.
The information processing method according to claim 1.
The decoding operation corresponding to the predetermined error correction coding is a thumb product algorithm.
The information processing method according to claim 1.
An information processing device that estimates the category to which input data belongs using a neural network.
The neural network calculates a feature vector from the input data and
Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
Information processing device.
The feature vector is a vector having a number of dimensions corresponding to the code length of the error correction code used for the predetermined error correction coding.
The information processing device according to claim 10.
The decoding operation corresponding to the predetermined error correction coding is composed of a combination of operations capable of error back propagation.
The information processing device according to claim 10.
The neural network is learned by using the error back propagation method.
The information processing device according to claim 10.
The error correction code used for the predetermined error correction coding is a low density parity check code.
The information processing device according to claim 10.
The error correction code used for the predetermined error correction coding is a turbo code.
The information processing device according to claim 10.
The decoding operation corresponding to the predetermined error correction coding is the maximum likelihood decoding algorithm.
The information processing device according to claim 10.
The decoding operation corresponding to the predetermined error correction coding is the BCJR algorithm.
The information processing device according to claim 10.
The decoding operation corresponding to the predetermined error correction coding is a thumb product algorithm.
The information processing device according to claim 10.
An information processing program that causes a computer to execute a process of estimating the category to which input data belongs using a neural network.
The neural network calculates a feature vector from the input data and
Based on the feature vector, the probability or score of the category to which the input data belongs is calculated using a decoding operation corresponding to a predetermined error correction coding.
An information processing program that executes processing.