WO2023095204A1

WO2023095204A1 - Learning device, autoencoding device, learning method, autoencoding method, and program

Info

Publication number: WO2023095204A1
Application number: PCT/JP2021/042980
Authority: WO
Inventors: 忍工藤; 幸浩坂東; 正樹北原
Original assignee: 日本電信電話株式会社
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2023-06-01
Also published as: JPWO2023095204A1

Abstract

A learning device according to one aspect of the present invention comprises a learning unit that executes learning to update coding and decoding processes in an autoencoding process involving using vector quantization, using a primary data feature that is a feature to be subjected to autoencoding and an auxiliary feature that is a feature of the primary data feature, and entropy-coding the result of vector-quantizing the primary data feature and entropy-coding the result of scalar-quantizing the auxiliary feature. The learning unit executes, through learning, a primary data-side probability estimation process for estimating the probability of occurrence of each element of a tensor indicating the primary data feature, and the primary data-side probability estimation process involves estimating said probability of occurrence by using the result of integrating a parameterized probability density function over an integration region that is a region obtained by dividing a vector space in which representative vectors are arranged in a lattice pattern and is a hyper-rectangular region including one lattice point of the vector space.

Description

Learning device, self-encoding device, learning method, self-encoding method and program

The present invention relates to a learning device, a self-encoding device, a learning method, a self-encoding method and a program.

There are two types of quantization processing in data compression using deep learning: scalar quantization and vector quantization. Vector quantization has higher compression performance. As attempts of vector quantization, there are attempts described in Non-Patent Document 1 and Non-Patent Document 2 below. The technology described in Non-Patent Document 1 is a technology called VQVAE. In the technique described in Non-Patent Document 1, end-to-end learning is not performed, and the distribution of feature vectors is assumed to be a uniform distribution, and optimization of encoding and decoding and optimization of representative vectors are separated. to implement. The technique of Non-Patent Document 2 is a technique called Soft to Hard. The technique of Non-Patent Document 2 performs end-to-end learning by approximating the quantization process using a softmax function and approximating the probability of occurrence of representative vectors with a histogram.

However, conventional techniques such as the technique described in Non-Patent Document 1 and the technique of Non-Patent Document 2 have problems related to obtaining self-encoding processing using vector quantization. The problem can be, for example, a stability problem, or a processing performance problem, for example, when the dimension of the vector is large, or when encoding at a high rate, the amount of computation or memory usage increases exponentially. It was a problem that would end up. As described above, in the prior art, the burden required to obtain self-encoding processing using vector quantization was sometimes large. As a result, in some cases, the load required for self-encoding using vector quantization is large, such as the fact that self-encoding using vector quantization cannot be realized due to lack of computer resources.

In view of the above circumstances, an object of the present invention is to provide a technology that reduces the burden required for self-encoding using vector quantization.

One aspect of the present invention is a process of self-encoding using vector quantization. is a self-encoding process that performs entropy encoding on the result of vector quantization of the main data feature amount and entropy encoding on the result of scalar quantization of the auxiliary feature amount, a learning unit that updates the encoding and decoding processes in the learning by learning, wherein the learning unit estimates the probability of occurrence of each element of the tensor representing the main data feature amount in the main data side probability estimation process and the main data side probability estimation processing is performed by dividing a vector space in which the representative vectors are arranged in a grid pattern, an area including one grid point of the vector space, and having a hyperrectangular shape. The learning device estimates the occurrence probability using a result of integration of a probability density function parameterized with a given area as an integration area.

One aspect of the present invention is a self-encoding target acquisition unit that acquires a self-encoding target, and a self-encoding process using vector quantization. and an auxiliary feature amount that is a feature amount of the main data feature amount, entropy coding of the result of vector quantization of the main data feature amount and scalar quantization of the auxiliary feature amount. a learning unit that updates the encoding and decoding processes in self-encoding processing that performs entropy encoding on the converted result by learning, and the learning unit indicates the main data feature amount in the learning. main-data-side probability estimation processing for estimating the probability of occurrence of each element of the tensor, wherein the main-data-side probability estimation processing divides a vector space in which representative vectors are arranged in a lattice, and divides the vector space into Estimate the occurrence probability using a learning device that estimates the occurrence probability using the result of integrating the probability density function parameterized as an integration region that is a region that includes one lattice point of and has a hypercube shape a self-encoding execution unit that performs self-encoding by vector quantization on the target acquired by the self-encoding target acquisition unit, using the learned encoding process and the learned decoding process, is an autoencoding device comprising:

One aspect of the present invention is a process of self-encoding using vector quantization. is a self-encoding process that performs entropy encoding on the result of vector quantization of the main data feature amount and entropy encoding on the result of scalar quantization of the auxiliary feature amount, a learning step of updating the encoding and decoding processes in the learning by learning, wherein the learning step estimates the probability of occurrence of each element of the tensor representing the main data feature amount in the learning, the main data side probability estimation process and the main data side probability estimation processing is performed by dividing a vector space in which the representative vectors are arranged in a grid pattern, an area including one grid point of the vector space, and having a hyperrectangular shape. This learning method estimates the occurrence probability using a result of integration of a probability density function parameterized with a given area as an integration area.

One aspect of the present invention is a self-encoding target acquisition step for acquiring a self-encoding target, and a self-encoding process using vector quantization, wherein the main data feature is a feature quantity of the self-encoding target. and an auxiliary feature amount that is a feature amount of the main data feature amount, entropy coding of the result of vector quantization of the main data feature amount and scalar quantization of the auxiliary feature amount. a learning unit that updates the encoding and decoding processes in self-encoding processing that performs entropy encoding on the converted result by learning, and the learning unit indicates the main data feature amount in the learning. main-data-side probability estimation processing for estimating the probability of occurrence of each element of the tensor, wherein the main-data-side probability estimation processing divides a vector space in which representative vectors are arranged in a lattice, and divides the vector space into Estimate the occurrence probability using a learning device that estimates the occurrence probability using the result of integrating the probability density function parameterized as an integration region that is a region that includes one lattice point of and has a hypercube shape a self-encoding execution step of performing self-encoding by vector quantization on the target obtained by the self-encoding target obtaining step, using the learned encoding process and the learned decoding process, is a self-encoding method with .

One aspect of the present invention is a program for causing a computer to function as the above learning device.

One aspect of the present invention is a program for causing a computer to function as the above self-encoding device.

The present invention makes it possible to reduce the burden required for self-encoding using vector quantization.

Explanatory drawing explaining the outline|summary of the learning apparatus of embodiment. FIG. 4 is an explanatory diagram for explaining LatticeVQ in the embodiment; FIG. 4 is a first explanatory diagram for explaining an example of adding noise in the embodiment; FIG. 7 is a second explanatory diagram for explaining an example of adding noise in the embodiment; Explanatory drawing explaining the relationship between the cumulative distribution function and the occurrence probability in the embodiment. Explanatory drawing explaining the hypercube division|segmentation in embodiment. The figure which shows an example of the flow of the process which the learning part in embodiment performs. 1 is a first explanatory diagram for explaining an outline of an autoencoding device according to an embodiment; FIG. FIG. 2 is a second explanatory diagram for explaining the outline of the self-encoding device in the embodiment; 4 is a flowchart showing an example of the flow of processing executed by an encoder according to the embodiment; 4 is a flowchart showing an example of the flow of processing executed by a decoder according to the embodiment; The figure which shows an example of the hardware constitutions of the learning apparatus in embodiment. The figure which shows an example of a structure of the control part with which the learning apparatus in embodiment is provided. The figure which shows an example of the hardware constitutions of the self-encoding apparatus in embodiment. The figure which shows an example of a structure of the control part with which the self-encoding apparatus in embodiment is provided.

(embodiment)
FIG. 1 is an explanatory diagram for explaining an overview of the learning device 1 of the embodiment. The learning device 1 includes a learning section 10 . The learning unit 10 performs learning so as to improve the performance of self-encoding using vector quantization of data represented by a tensor. The performance of self-encoding is evaluated by the smallness of the RD cost (D+λR), which is the weighted sum of the error D between the original data and the restored data and the code amount R of the data with the Lagrangian constant λ. A smaller RD cost indicates better RD performance. Note that learning means machine learning. Learning is, for example, deep learning. As is well known, data self-encoding means compression of data.

The learning unit 10 includes a learning network 100 and an optimization unit 113. Learning network 100 is a neural network. The learning network 100 includes a main data acquisition unit 101, a main data side encoding unit 102, an auxiliary data side encoding unit 103, a main data side noise addition unit 104, an auxiliary data side noise addition unit 105, and an auxiliary data side A probability estimation unit 106, an auxiliary data side decoding unit 107, an auxiliary entropy acquisition unit 108, a main data side probability estimation unit 109, a main entropy acquisition unit 110, a main data side decoding unit 111, and a reconstruction error calculation unit. 112 and. Although details will be described later, the optimization unit 113 updates the learning network 100 based on the output of the learning network 100 .

The main data acquisition unit 101 acquires data represented by a tensor as main data. Data represented by a tensor is, for example, image data. The tensor-expressed data may be, for example, time-series data of one or more channels of audio. The data acquired by the main data acquisition unit 101 is hereinafter referred to as main data.

The main data side encoding unit 102 executes main data feature quantity acquisition processing. The main data feature amount acquisition process is a process of encoding the main data. Coding is a process of obtaining information indicating the characteristics of an object to be encoded, so encoding is a process of obtaining information indicating the amount of characteristics.

Therefore, the encoding of the main data by the main data side encoding unit 102 is a process of acquiring the feature amount of the main data. Therefore, the main data feature amount acquisition process is a process of acquiring the main data feature amount. The main data feature quantity is encoded main data. The content of the main data feature quantity acquisition process is updated by learning. That is, the contents of the processing executed by main data side encoding section 102 are updated by learning.

The auxiliary data side encoding unit 103 further encodes the main data feature quantity. As described above, encoding is a process of acquiring information indicating a feature amount, so encoding of the main data feature amount by the auxiliary data side encoding unit 103 is a process of acquiring the feature amount of the main data feature amount. be. Therefore, hereinafter, information obtained by further encoding the encoded main data will be referred to as an auxiliary feature amount. That is, the auxiliary feature amount is information indicating the feature amount of the main data feature amount. Since the auxiliary feature amount is information obtained by encoding the main data feature amount, the entropy of the auxiliary feature amount is information smaller than the entropy of the main data feature amount.

Hereinafter, the process of encoding the main data feature quantity will be referred to as the auxiliary feature quantity acquisition process. The content of the auxiliary feature quantity acquisition process is updated by learning. That is, the contents of the processing executed by the auxiliary data side encoding unit 103 are updated by learning.

The content of the auxiliary feature acquisition process is updated through learning so that the amount of information in the main data statistics information is included in the auxiliary feature. The main data statistic information is information indicating the statistic of each probability distribution followed by the value of each element of the tensor representing the main data. A statistic of the probability distribution is, for example, the degree of dispersion. The statistic of the probability distribution may be not only the degree of dispersion but also a set of the degree of dispersion and a representative value.

In addition, as explained in the fields of information processing and information theory, data generally appears according to a probability distribution, such as the probability distribution of the appearance of each Roman character that appears in English. The value of each element of the tensor representing the main data also follows the probability distribution.

The main data side noise addition unit 104 executes noise-added main data feature quantity acquisition processing. The noisy main data feature amount acquisition process is a process of applying vector noise to the main data feature amount. The vector noise addition process is a process for processing a K-dimensional vector (hereinafter referred to as a "noise addition target vector") having a predetermined number K (K is an integer of 2 or more) of elements, This is the process of adding noise to the processing target. Therefore, the noise-added main data feature amount acquisition process is a process for acquiring information in which noise is added to the main data feature amount (hereinafter referred to as "noise-added main data feature amount").

Specifically, the noise addition target vector is a K-dimensional vector included in the tensor that expresses the main data feature amount. A K-dimensional vector contained in a tensor means that the k-th element of the vector (k is an integer of 1 or more and K or less) is the k-th element of consecutive K elements among all the elements of the tensor. There is, which means vector.

The number of elements K of the noise addition target vector is the number of elements mapped to one code by vector quantization by a device that performs vector quantization using the learning result of the learning unit 10 . The result of learning by the learning unit 10 is hereinafter referred to as a network learning result. Therefore, when K elements are collectively mapped to one code by vector quantization using the network learning result, the number of elements of the noise addition target vector is K. Note that when K elements are collectively mapped to one code by quantization, the obtained code is an index indicating a K-dimensional vector.

Therefore, the number of elements K of the noise addition target vector is a predetermined number. Since vector quantization of data is data encoding, vector quantization using network learning results means data encoding using network learning results.

A method for adding noise will be described here, but before describing noise addition, LatticeVQ will be described.

<Lattice VQ>
Vector quantization requires a representative vector. Lattice is a set of lattice points in vector space. LatticeVQ is vector quantization when representative vectors are arranged in a lattice in a vector space. That is, LatticeVQ is vector quantization that satisfies the condition that representative vectors are arranged in a lattice in a vector space. LatticeVQ is known to have better RD performance than scalar quantization, except for certain conditions.

FIG. 2 is an explanatory diagram explaining LatticeVQ in the embodiment. More specifically, FIG. 2 is an explanatory diagram for explaining LatticeVQ when the number of elements of the noise addition target vector is two.

FIG. 2 is an example of an _A2 lattice. In the two-dimensional case, one type of lattice is the _A2 lattice. Note that for eight dimensions, one type of lattice is the _E8 lattice. Note that for the 24-dimensional case, one type of lattice is the Reach lattice, each of which is a LatticeVQ lattice that maximizes the RD performance for uniform distributions.

The space in which the representative vectors are arranged in a lattice is hereinafter referred to as the lattice space. That is, the representative vector is located at each lattice point in the lattice space. FIG. 2 is also a diagram showing an example of a lattice space. In the example of FIG. 2, the lattice space is two-dimensional, but if the representative vectors are K-dimensional, the lattice space is also K-dimensional.

<Addition of noise>
Now, an example of adding noise will be described with reference to FIGS. 3 and 4, where the number of elements is two. FIG. 3 is a first explanatory diagram illustrating an example of adding noise in the embodiment. FIG. 4 is a second explanatory diagram illustrating an example of adding noise in the embodiment.

When applying noise, first, a plurality of random K-dimensional vectors uniformly distributed within a (K−1)-dimensional sphere circumscribing the grid unit region of the origin grid (hereinafter referred to as the “target grid unit region”) are generated, Arrange in grid space. For example, in the example of FIG. 4, when adding noise, first, the internal points in FIG. 4 are generated until they reach at least one or more. A lattice unit area is each area resulting from dividing a lattice space into a plurality of areas so that a division condition is satisfied.

The division condition is that each region has the same size and shape, and each region includes one grid point in the grid space. Therefore, a lattice unit area is an area that divides a vector space in which representative vectors are arranged in a lattice and includes one lattice point of the vector space.

It should be noted that the (K−1)-dimensional sphere is a circle if K=2, a sphere if K=3, and a hypersphere if K is 4 or more. A region B1 in FIG. 3 is an example of the target lattice unit region.

In the examples of FIGS. 3 and 4, the lattice unit regions are Voronoi regions. A Voronoi region is each region obtained by dividing a metric space such as a lattice space into a plurality of regions by Voronoi division.

The noise points in FIG. 3 are an example of samples arranged within the circumscribed circle of the target grid unit area. Note that the process of arranging the samples in the lattice space is specifically the process of acquiring the coordinates in the lattice space. Since the lattice unit area in the lattice space is an area within the lattice space, the process of arranging the samples within the circumscribed circle of any one lattice unit area is the process of obtaining the coordinates within the circumscribed circle. be.

Next, among the arranged noise points, noise points located within the target lattice unit region (hereinafter referred to as “internal points”) and noise points located outside the target grid unit region (hereinafter referred to as “exclusion points”). ) and are distinguished as shown in the example of FIG. Specifically, the process of distinguishing between internal points and excluded points is a process of determining, for each noise point, whether the coordinates are inside the target grid unit area or outside the target grid unit area based on the coordinates of the noise point. . Specifically, the determination process is a process in which points that match the origin lattice as a result of vector quantization of the noise points by Equation (11) are determined to be coordinates within the region.

Next, one of the plurality of noise points located within the grid unit area of interest is randomly selected. Next, the selected noise points are added to one noise addition target vector. The addition process specifically means a process of adding a noise-addition target vector and a position vector indicating a selected noise point.

In this way, the position vector indicating the selected noise point is a type of noise because it is a quantity determined by random numbers. Since the position vector indicating the selected noise point is represented by a vector, the position vector indicating the selected noise point is noise represented by a vector. Therefore, the position vector indicating the selected noise point is hereinafter referred to as a noise vector. In this way, noise is added to the noise addition target vector.

Return to the description of Figure 1. The auxiliary data-side noise addition unit 105 executes auxiliary feature quantity acquisition processing with noise. The noise-attached auxiliary feature acquisition process is a process of adding noise to the auxiliary feature. That is, the noise-attached auxiliary feature acquisition process is a process of acquiring information obtained by adding scalar noise to the auxiliary feature (hereinafter referred to as "noise-attached auxiliary feature"). More specifically, the auxiliary data noise adding unit 105 adds scalar noise to each element of the tensor representing the auxiliary feature amount. The scalar noise is, for example, uniform noise between -1/2 and 1/2.

The auxiliary data side probability estimation unit 106 executes auxiliary data side probability estimation processing. The auxiliary data side probability estimation process is a process of estimating the auxiliary data side probability based on the auxiliary feature amount with noise. The auxiliary data side probability is information indicating the occurrence probability of each element of the tensor indicating the auxiliary feature amount. Information on the probability distribution of each element of the tensor indicating the auxiliary feature is used for estimating the occurrence probability of each element of the tensor indicating the auxiliary feature.

When the probability distribution of each element of the tensor indicating the auxiliary feature amount is a given probability distribution, the auxiliary data side probability estimation unit 106 stores the given probability distribution in a predetermined storage unit 14 or the like, which will be described later. obtained by reading from the storage device. Then, the auxiliary data side probability estimation unit 106 estimates the occurrence probability of each element of the tensor indicating the auxiliary feature value based on the acquired probability distribution. The probability distribution given in advance is, for example, a probability distribution expressed using a cumulative distribution function.

When the probability distribution of each element of the tensor representing the auxiliary feature is represented by the parameterized auxiliary feature cumulative distribution function, the auxiliary data side probability estimation unit 106 calculates the auxiliary Estimate the occurrence probability of each element of the tensor representing the feature quantity. The parameterized auxiliary feature cumulative distribution function is a parameterized function indicating the probability distribution of each element of the tensor indicating the auxiliary feature. The parameter of the parameterized auxiliary feature cumulative distribution function is specifically a parameter that changes according to the statistic representing the probability distribution of each element of the tensor representing the auxiliary feature.

The parameter values of the parameterized auxiliary feature cumulative distribution function are updated by learning. That is, the content of the processing executed by the auxiliary data side probability estimation unit 106 is updated by learning.

A parametrized cumulative distribution function is, for example, a parametrized sigmoid function or soft plus function. Since the parameter values of the parameterized auxiliary feature cumulative distribution function are updated by learning as described above, the parameter values of the parameterized cumulative distribution function are updated by learning.

Here, the relationship between the cumulative distribution function and the occurrence probability will be explained using a one-dimensional random variable as an example. FIG. 5 is an explanatory diagram illustrating the relationship between cumulative distribution functions and occurrence probabilities in the embodiment. An image G1 is an image showing an example of a probability density function that indicates the probability density of the random variable q. Δ indicates the step size of quantization. The image G1 shows that the value obtained by integrating the probability density function over the range Δ within the domain centered on the value q is the occurrence probability p(q) of the value q. In the one-dimensional case as in FIG. 5, the step size Δ is, for example, the magnitude of the closed interval [−1/2, 1/2] (ie 1).

Image G2 is the cumulative distribution function cdf(q) obtained as a result of integration of the probability density function of image G1. Thus, the result of integration of the probability density function is represented by a monotonically increasing function such as a sigmoid function. The result of integration of the probability density function is represented by a monotonically increasing function such as a sigmoid function regardless of the shape of the probability density function. More specifically, the cumulative distribution function satisfies the conditions that cdf(−∞)=0, cdf(∞)=1, and that the differential value of cdf(q) with respect to q is 0 or more. is a function.

Image G2 shows that the probability of occurrence p(q) is equal to cdf(q+Δ/2)−cdf(q−Δ/2). Returning to the description of FIG.

The auxiliary data side decoding unit 107 executes auxiliary feature amount decoding processing. The auxiliary feature amount decoding process is a process for processing information obtained based on the auxiliary feature amount, and is a process for decoding the processing target. A processing target of the auxiliary feature amount decoding process executed by the auxiliary data side decoding unit 107 is the auxiliary feature amount with noise. Hereinafter, the information obtained by decoding the auxiliary feature with noise will be referred to as auxiliary data. Therefore, the auxiliary data decoding unit 107 acquires auxiliary data by decoding the auxiliary feature with noise.

As described above, the more the content of the auxiliary feature acquisition process is updated through learning, the smaller the RD cost. Therefore, as the content of the auxiliary feature amount acquisition process is updated, the maximum amount of information of the main data statistic information that can be included in the auxiliary data also increases.

The content of the auxiliary feature decoding process is updated through learning so that the RD cost is reduced. That is, the contents of the processing executed by the auxiliary data side decoding unit 107 are updated by learning.

The auxiliary entropy acquisition unit 108 acquires auxiliary entropy based on the auxiliary data side probability, which is the estimation result of the auxiliary data side probability estimation unit 106 . Auxiliary entropy is the entropy of auxiliary features.

The main data side probability estimation unit 109 executes main data side probability estimation processing. The main data side probability estimation process is a process of estimating the main data side probability based on the noise-added main data feature amount and the auxiliary data. The main data side probability is information indicating the occurrence probability of each element of the tensor indicating the main data feature amount. Information on the probability distribution of each element of the tensor representing the main data feature is used to estimate the occurrence probability of each element of the tensor representing the main data feature.

When the probability distribution of each element of the tensor representing the main data feature amount is expressed using the parameterized main data feature amount cumulative distribution function, the main data side probability estimation unit 109 calculates the parameterized main data feature amount cumulative distribution function Based on, the occurrence probability of each element of the tensor representing the main data feature is estimated.

The parameterized main data feature value cumulative distribution function is a cumulative distribution function that indicates the probability distribution of each element of the tensor that indicates the main data feature value and is a parameterized cumulative distribution function. A probability distribution is, for example, a Gaussian distribution.

The parameter of the parameterized main data feature quantity cumulative distribution function is specifically a statistic representing the probability distribution of each element of the tensor representing the main data feature quantity. Therefore, in learning, the value indicated by the auxiliary data is used as the value of the parameter of the parameterized auxiliary feature cumulative distribution function.

For example, if the probability distribution is a Gaussian distribution, the auxiliary data obtained in the learned learning network 100 indicates the representative value and scatter of the Gaussian distribution.

<Estimation of Occurrence Probability>
An example of main data side probability estimation processing will be described. More specifically, an example of main data side probability estimation processing when using the LatticeVQ described above will be described. Each vector of the main data is represented by one of the representative vectors when vector quantization is performed. Therefore, the explanation of the main data side probability estimation processing when using LatticeVQ is, more specifically, the explanation of the processing of estimating the occurrence probability of the representative vector when using LatticeVQ. Hereinafter, the process of estimating the occurrence probability of representative vectors will be referred to as representative vector occurrence probability estimation process. The representative vector occurrence probability estimation process is an example of the main data side probability estimation process.

<Representative Vector Occurrence Probability Estimation Processing>
As described above, the main data side probability estimation process uses the parameterized main data feature amount cumulative distribution function. Further, as described above, the parameterized main data feature quantity cumulative distribution function is specifically a parameterized cumulative distribution function. Furthermore, as described above, the cumulative distribution function is the result of integrating the probability density function. Therefore, representative vector occurrence probability estimation processing is processing using a cumulative distribution function. The use of the cumulative distribution function in the representative vector occurrence probability estimation process means that the result of integrating the parameterized probability density function in the grid unit area is used.

By the way, the Voronoi region is one of the lattice unit regions as described above. Voronoi tessellation is a well-known region segmentation method, and therefore is often used to obtain lattice unit regions. However, when the lattice space is two-dimensional, the shape of the Voronoi region is hexagonal as described above.

However, in the integration of a function on a manifold, it is not always easy to perform the integration when the shape of the region indicating the integration region is hexagonal. It should be noted that the ease of execution of integration means that the amount of calculation required to obtain the result of integration with accuracy equal to or higher than a predetermined accuracy is small.

Furthermore, when the manifold has more dimensions than two, performing the integral of the function over the domain resulting from the Voronoi division is even less straightforward than in the two-dimensional case. For example, in the case of four dimensions, it is often difficult to draw the shape of the Voronoi region in two-dimensional space, and it is not easy to perform the integral of a function on such a manifold. As described above, it is not always easy to integrate a function using the area obtained as a result of the Voronoi tessellation as the integration area.

Therefore, it may not be easy to obtain the parameterized cumulative distribution function used in the representative vector occurrence probability estimation process.

Therefore, in representative vector occurrence probability estimation processing, a parametrized cumulative distribution function obtained using the results of hyperrectangular partitioning is used instead of Voronoi partitioning. The hyperrectangular parallelepiped division is a process of dividing a lattice space into lattice unit regions each having a hyperrectangular parallelepiped shape. A two-dimensional hyperrectangular parallelepiped is a rectangle, and a three-dimensional hyperrectangular parallelepiped is a rectangular parallelepiped.

FIG. 6 is an explanatory diagram explaining the hyperrectangular parallelepiped division in the embodiment. More specifically, FIG. 6 is a diagram showing an example of the result of hyperrectangular partitioning for a two-dimensional lattice space together with an example of the result of Voronoi partitioning. Both the “true region” and the “approximate region” in FIG. 6 are examples of grid unit regions.

The "true domain" is the domain resulting from the Voronoi division. That is, the "true domain" is the Voronoi domain. A "true area" is a hexagonal lattice unit area. "Approximate region" is the grid unit region resulting from the hypercube division.

The shape of the "approximation area" is a rectangular parallelepiped. Among the two types of division results shown in FIG. 6, one example of the result of the hyperrectangular parallelepiped division executed by the main data side probability estimation unit 109 is the result of the division by the "approximate region". _S1 in FIG. 6 is the length of the side of the rectangular parallelepiped which is the shape of the "approximation area", and indicates the length of the side of the first dimension in the two-dimensional lattice space, and _S2 is the length of the side of the second dimension. Indicates length.

The length of the first-dimensional side means the length of the rectangular parallelepiped in one oblique space when the rectangular parallelepiped in the grid space is projected into two orthogonal oblique spaces. The length of the second-dimensional side means the length of the rectangular parallelepiped in the other oblique space. In the following description, the length of the n-th side of the hypercube in the N-dimensional lattice space means Means the length of the hyperrectangular parallelepiped. A hypercube in oblique space is a straight line.

　By the way, integration with a two- or more-dimensional area as the integration domain can be obtained by iterative integration. In the integration of a function whose integration region is an N-dimensional hyperrectangular parallelepiped, iterative integration is performed to perform N times of integration on the function. In the case of a hyperrectangular parallelepiped, iterative integration can be performed with the integration region of each N times of integration as the sides of the hyperrectangular parallelepiped.

Then, when performing iterative integration with the integration area of each N times of integration as the sides of the hyperrectangular parallelepiped, each integration is not affected by the results of other integrations. On the other hand, in the case of a hyperpolyhedron that is not a hypercube, each integral in the iterative integral is affected by the results of other integrals. Therefore, when the shape of the lattice unit region is a hyperrectangular parallelepiped, integration is easier than when the shape of the lattice unit region is a hyperpolyhedron that is not a hyperrectangular parallelepiped.

In the example of FIG. 6, the "true region" has a hexagonal shape and the "approximate region" has a rectangular parallelepiped shape. It is easier than integration over regions.

When obtaining a parametrized cumulative distribution function using the hypercube obtained as a result of hypercube partitioning as an integration domain, the probability density function is a one-dimensional probability density function that is not affected by other dimensions in each integration of iterative integration. is performed.

In the learning network 100, the probability density function to be iteratively integrated is a function that indicates a predetermined type of distribution and is a parameterized function. A predetermined type of distribution is a Gaussian distribution. The values of the parameters of the probability density function are values according to the auxiliary data. A parameterized cumulative distribution function is obtained by performing iterative integration on the probability density function using a hyperrectangular parallelepiped obtained by hyperrectangular parallelepiped as an integration region. In the learning network 100, the values of the parameters of the probability density function are the values of the auxiliary data, so the parameterized cumulative distribution function thus obtained is a function according to the auxiliary data.

In the representative vector occurrence probability estimation process, auxiliary data values are substituted for the parameters of the cumulative distribution function obtained in advance in this manner. In the representative vector occurrence probability estimation process, the cumulative distribution function thus obtained is used to estimate the occurrence probability of the representative vector.

In the representative vector occurrence probability estimation process, for example, the occurrence probability of representative vectors is estimated by executing the processes represented by the following equations (1) and (2). Note that the cumulative distribution function cdf in Equation (2) represents each cumulative distribution function obtained by each integration of iterative integration. The result of integrating the m-dimensional (m is a natural number) cumulative distribution function for one dimension is the (m−1)-dimensional cumulative distribution function.

The right side of Equation (1) is an example of the occurrence probability obtained using the parameterized main data feature quantity cumulative distribution function. Equation (1) is a function parametrized via the cumulative distribution function cfd of Equation (2). Equation (2) is a cumulative distribution function obtained as a result of integrating the probability density function in the one-dimensional direction of the grid unit area. Equation (1) is the product of the occurrence probabilities of each dimension obtained by Equation (2). Therefore, equation (1) is the occurrence probability expressed using the result of integrating the probability density function over the entire lattice unit area.

i is an identifier that identifies each lattice point. j means each dimension of the lattice space. The left side of Equation (1) indicates the occurrence probability of representative vectors. Since each grid point represents a representative vector, the occurrence probability p _i at each grid point means the occurrence probability p _i of each representative vector. Δ _j means the length of the dimension represented by the lattice unit region identifier j. Since the shape and size of the lattice unit area are predetermined, _Δj is a predetermined length.

It should be noted that y given a hat symbol represented by the following equation (3) means the feature amount of the main data with noise. The symbol A with a hat is hereinafter referred to as A^. Thus, for example, the hated symbol y in equation (3) below is y^.

Equation (1) indicates that the occurrence probability p _i is expressed by the product of the probabilities p _i [j] obtained for each dimension. If the shape of the lattice unit region is not hyperrectangular, the occurrence probability p _i cannot be expressed by a simple product of the probabilities p _i [j] obtained for each dimension. Therefore, since the shape of the grid unit area is a hyperrectangular parallelepiped, the main data side probability estimation unit 109 can easily acquire the occurrence probability _pi as shown by the equations (1) and (2).

The above formulas (1) and (2) are obtained by assuming that the covariance between dimensions is zero. However, if the shape of the lattice unit area is a hyperrectangular parallelepiped, the covariance between dimensions can be reduced to 0 by rotating the coordinate axes of the lattice space so as to be parallel to each side of the hyperrectangular parallelepiped. Therefore, equations (1) and (2) are equations that are established by rotation of the coordinate axes even if the covariance between dimensions is not zero. It is a well-known fact in linear algebra that the rotation of the coordinate axes is a unitary transformation, so it is not a transformation that changes the contents of equations (1) and (2). Note that the process of setting the covariance to 0 is the process of diagonalizing the matrix representing the variance between dimensions.

In this way, the main data-side probability estimation process uses the result of integrating the parameterized probability density function with the grid unit region having a hyperrectangular parallelepiped shape as the integration region, and uses each of the tensors representing the main data feature quantity. Estimate the occurrence probability of an element.

<Processing for Determining Range of Hypercube>
So far, the use of hyperrectangular parallelepipeds as lattice unit regions has been explained, but no mention has been made of how to determine the shape and size of the hyperrectangular parallelepipeds. Here, an example of processing for determining the shape and size of a lattice unit region that at least satisfies the condition of being a hyperrectangular parallelepiped (hereinafter referred to as "hyperrectangular parallelepiped determination processing") will be described.

　In the hyperrectangular solid determination process, grids adjacent to the origin grid are first calculated. The origin grid is a grid point located at the origin of the grid space. Neighboring grids are grid points next to the origin. Next, in hyperrectangular solid determination processing, hyperrectangular solids satisfying hyperrectangular solid determination conditions are calculated. The hypercube determination conditions include the condition that the lattice unit area of the origin lattice does not overlap with the lattice unit area of each adjacent lattice and the condition that the volume of the hypercube is equal to the volume of the Voronoi region.

As a specific example of the hypercube determination process, conditions used to determine the shape and size of a hypercube in an eight-dimensional E8 lattice space will be described. There are the following two types of grids adjacent to the origin grid. One is [±1 ² , 0 ⁶ ]. Another is [±(1/2) ⁸ ]. where the superscript indicates the number of dimensions. For example, [1 ² , 0 ⁶ ] means [1, 1, 0, 0, 0, 0, 0, 0]. Note that the square brackets [ ] above represent a vector.

<Regarding the notation for hyperrectangular parallelepiped>
The superscript in the hypercube [s ₁ ^a , s ₂ ^b … s ₃ ^c ] has a length of s ₁ for each side from the 1st dimension to the ath dimension, and from the (a+1)th dimension to ( It indicates that the length of each side up to the a+b) dimension is _s2 . In this way, a superscript in a notation that uses a superscript is a character to be attached with the superscript, and for a series of dimensions equal to the number of superscripts, the number of sides of the hypercube Denote that the lengths are of the same length s. That is, s ^a indicates that for a sequence of a dimensions, the side length of the hyperrectangular parallelepiped is s. It should be noted that information on the order of dimensions is required for determining whether or not the dimensions are continuous, but the order of dimensions is a predetermined order.

Note that the “±” in [±1 ² , 0 ⁶ ] indicates that there are cases of +1 and −1. Therefore, [±1 ² , 0 ⁶ ] is specifically [1, 1, 0 ⁶ ], [−1, 1, 0 ⁶ ], [1, −1, 0 ⁶ ], and [ −1, −1, 0 ⁶ ].

　Back to the description of determining the shape and size of the hypercube in the 8-dimensional lattice space. There are two conditions under which the hyperrectangular parallelepiped of each adjacent lattice does not overlap with the hyperrectangular parallelepiped of the origin lattice. One condition is that 7 or more of the sides (elements) of the hyperrectangular parallelepiped are 1 or less. Another condition is that one or more of the sides (elements) of the hyperrectangular parallelepiped is 1/2 or less.

Also, since the volume of the Voronoi region of the E8 lattice is 1, the hypercube satisfies the condition of Equation (4) below.

As a hypercube that satisfies these conditions, s=[1/2, ^{1 6} , 2], for example, is obtained in the case of an eight-dimensional E8 lattice space.

I will introduce an example for other dimensions. In the case of the two-dimensional A2 lattice space, the range of the lattice unit area is the range enclosed by the hyperrectangular parallelepiped represented by Equation (5) below.

In the case of the eight-dimensional E8 lattice space, the range of the lattice unit area is the range enclosed by the hyperrectangular parallelepiped represented by the following equation (6).

In the case of the 24-dimensional reach lattice space, the range of the lattice unit area is the range enclosed by the hyperrectangular parallelepiped represented by the following equation (7).

In this way, the range of the hyperrectangular parallelepiped (the shape and size of the lattice unit area) is determined. Returning to the description of FIG.

The main entropy acquisition unit 110 acquires the entropy of the main data feature amount based on the estimation result of the main data side probability estimation unit 109 . Hereinafter, the entropy of the main data feature quantity will be referred to as the main entropy.

The main data side decoding unit 111 executes main data feature amount decoding processing. The main data feature amount decoding process is a process for processing information obtained based on the main data feature amount, and is a process for decoding the processing target. The processing target of the main data feature amount decoding process executed by the main data side decoding unit 111 is the main data feature amount with noise. The contents of the decoding process of the main data side decoding unit 111 are updated by learning.

The reconstruction error calculation unit 112 calculates the difference between the decoding result of the main data side decoding unit 111 and the main data acquired by the main data acquisition unit 101 . Hereinafter, the difference between the decoding result of the main data side decoding unit 111 and the main data acquired by the main data acquisition unit 101 is called a reconstruction error. The difference between the decoding result of the main data side decoding unit 111 and the main data acquired by the main data acquisition unit 101 may be represented by, for example, the sum of mean square errors or binary cross entropy.

The optimization unit 113 updates the learning network 100 based on the auxiliary entropy, the primary entropy and the reconstruction error. Note that auxiliary entropy, primary entropy, and reconstruction error are all examples of outputs of learning network 100 . Specifically, the optimization unit 113 updates the learning network 100 so as to reduce the optimization error, primary entropy, and auxiliary entropy. The objective function used by the optimization unit 113 is, for example, L=D+λ(R _y +R _z ).

The symbol L represents an objective function. Symbol D represents the reconstruction error. The symbol lambda is a predetermined constant. The symbol R _y represents the auxiliary entropy. The symbol _Rz represents the principal entropy.

A small entropy means a short code length, so the optimization unit 113 updates the learning network 100 to reduce the entropy. Also, since the smaller the optimization error, the higher the accuracy of self-encoding, the optimization unit 113 updates the learning network 100 so as to reduce the optimization error.

For example, the learning network 100 is updated by solving the minimization problem of the objective function L using the gradient method. That is, the learning network 100 is updated by updating the values of the parameters of the learning network 100 by, for example, the error backpropagation method.

Specifically, the updating of the learning network 100 includes updating the main data side encoding unit 102, the auxiliary data side encoding unit 103, the auxiliary data side probability estimation unit 106, the auxiliary data side decoding unit 107, and the main data side decoding unit 111. It means to update the contents of the process.

<Relationship between self-encoding and learning network 100>
By the way, the learning network 100 includes a main data side noise adding section 104 . The main data side noise addition unit 104 itself is not a quantization process. However, the performance of self-encoding using vector quantization is improved by performing learning precisely because the processing of the main data side noise addition unit 104 is included. I will explain why.

For example, if vector quantization is performed during learning as described in Reference 1 below, the gradient becomes 0. Therefore, execution of learning that improves the efficiency of self-encoding processing using vector quantization is known to be impossible. Therefore, in order to improve the efficiency of the self-encoding process by learning, a process of adding noise to vectors is performed during learning instead of vector quantization itself.

By performing the process of adding noise instead of vector quantization itself, the performance of other processes that generate information used for vector quantization, rather than the process of vector quantization itself, increases. As a result, even if vector quantization is performed instead of applying noise during self-encoding, self-encoding is performed with higher efficiency than before learning.

Reference 1: Balle, ``Variational image compression with a scale hyperprior,'' 2018

Regarding the learning unit 10, the main data noise addition unit 104 adds noise to the noise addition target vector. Therefore, the gradient does not become 0 by performing learning including the noise addition unit 104 on the main data side. As a result, the main data side encoding unit 102, the auxiliary data side encoding unit 103, the auxiliary data side probability estimation unit 106, the auxiliary data side decoding unit 107, and the main It is possible to update the contents of the data side decoding unit 111 .

Therefore, what has been described as a network learning result thus far specifically means peripheral processing updated by learning of a neural network (that is, learning network 100) that performs noise addition instead of vector quantization. do. Peripheral processing is the processing that produces the information used in vector quantization. Specifically, the peripheral processing is executed by each of main data side encoding section 102, auxiliary data side encoding section 103, auxiliary data side probability estimation section 106, auxiliary data side decoding section 107, and main data side decoding section 111. processing.

FIG. 7 is a diagram showing an example of the flow of processing executed by the learning unit 10 in the embodiment. The main data acquisition unit 101 acquires main data x=[x ₁ , x ₂ , . . . , x _N ] (step S101). The main data x is an N-dimensional vector having N elements from _x1 to _xN . Each element from x ₁ to x _N is a tensor. Therefore, each element from _x1 to _xN may be a scalar or a vector.

Next, the main data encoding unit 102 executes main data feature amount acquisition processing (step S102). That is, the main data side encoding unit 102 encodes the main data x. By encoding the main data, a main data feature quantity y=f _enc (x) is obtained. The function f _enc (x) is a function (hereinafter referred to as "main data encoding function") that expresses the encoding process of the main data x. Note that the main data feature quantity y is a tensor composed of k K-dimensional vectors.

Next, the auxiliary data side encoding unit 103 executes auxiliary feature amount acquisition processing (step S103). By executing the auxiliary feature amount acquisition process, the auxiliary feature amount z=g _enc (y) is obtained. The function g _enc (y) is a function that expresses the processing of encoding the main data feature quantity y (hereinafter referred to as "main data feature quantity encoding function"). Note that the auxiliary feature z is a tensor such as a vector.

Next, the main-data-side noise addition unit 104 executes noise-added main data feature amount acquisition processing (step S104). Noise is added to the main data feature amount by executing the noise-attached main data feature amount acquisition process. That is, by executing the noisy main data feature amount acquisition process, the noisy main data feature amount y^=[y ₁ ^, y ₂ ^, . . . , y _k ^] is obtained. However, _yi ^= _yi + _uy . _yi represents the i-th vector element of the main data feature quantity y. _uy represents noise.

Next, the auxiliary data side noise addition unit 105 executes auxiliary feature quantity acquisition processing with noise (step S105). Noise is added to the auxiliary feature amount by executing the auxiliary feature amount acquisition process with noise. That is, the noise _- attached auxiliary feature z^=[z ₁ ^, z ₂ ^, . Note that w is an integer of 1 or more. However, z _i ^=z _i + u _z . _zi represents the i-th element of the auxiliary feature z. u _z represents noise.

Next, the auxiliary data side probability estimation unit 106 executes auxiliary data side probability estimation processing (step S106). The auxiliary data-side probability estimation unit 106 estimates auxiliary data-side probabilities by executing auxiliary data-side probability estimation processing. The auxiliary data side probability is specifically represented by the following equation (8).

The symbol on the left side of Equation (8) represents the auxiliary data side probability. The symbol h is a parameterized auxiliary feature cumulative distribution function.

Next, the auxiliary data side decoding unit 107 executes auxiliary feature quantity decoding processing on the auxiliary feature quantity with noise (step S107). By executing the auxiliary feature decoding process, the auxiliary feature with noise is decoded. That is, auxiliary data θ=g _dec (ẑ) is obtained by executing the auxiliary feature amount decoding process. The function g _dec (ẑ) is a function (hereinafter referred to as “auxiliary feature quantity decoding function”) that expresses the process of decoding the auxiliary feature quantity with noise ẑ. The auxiliary data θ is a tensor such as a vector.

Next, the auxiliary entropy acquisition unit 108 acquires auxiliary entropy based on the auxiliary data side probability (step S108). Specifically, the auxiliary entropy acquisition unit 108 acquires the auxiliary entropy by executing the process represented by the following formula (9).

The symbol on the left side of Equation (9) represents auxiliary entropy.

Next, the main data side probability estimation unit 109 executes main data side probability estimation processing (step S109). The main-data-side probability estimation unit 109 estimates the main-data-side probability based on the noise-added main-data feature amount and the auxiliary data by executing the main-data-side probability estimation process. The main data side probability is specifically represented by the above-described formula (1).

Next, the primary entropy acquisition unit 110 acquires primary entropy based on the primary data side probability (step S110). Specifically, the primary entropy acquisition unit 110 acquires the primary entropy by executing the process represented by the following formula (10).

The symbol on the left side of Equation (10) represents the principal entropy.

Next, the main data side decoding unit 111 executes main data feature amount decoding processing on the main data feature amount with noise (step S111). By executing the main data feature amount decoding process, the main data feature amount with noise is decoded. Hereinafter, the information obtained by decoding the main data feature amount with noise will be referred to as decoded main data. Therefore, the main-data-side decoding unit 111 obtains decoded main data x̂=f _dec (ŷ) by executing the main data feature quantity decoding process. The function f _dec (ŷ) is a function (hereinafter referred to as “main data feature quantity decoding function”) that expresses the process of decoding the main data feature quantity ŷ with noise. The decoded main data x̂ is a tensor such as a vector.

Next, the reconstruction error calculation unit 112 acquires the difference between the decoded main data and the main data acquired by the main data acquisition unit 101 (step S112). The difference between the decoded main data and the main data acquired by the main data acquisition unit 101 is the reconstruction error.

Next, the optimization unit 113 updates the learning network 100 based on the auxiliary data entropy, main data entropy, and reconstruction error (step S113). Next, the optimization unit 113 determines whether or not a predetermined termination condition (hereinafter referred to as "learning termination condition") regarding learning is satisfied (step S114). The learning end condition is, for example, a condition that the learning network 100 has been updated a predetermined number of times.

If the learning end condition is satisfied (step S114: YES), the process ends. On the other hand, if the learning end condition is not satisfied (step S114: NO), the process returns to step S101. The peripheral processing at the time when the learning end condition is satisfied is used for vector quantization as the learned peripheral processing.

It should be noted that each process from step S101 to step S114 may be executed in any order as long as it does not violate the law of causality.

Note that, as described above, the auxiliary data side probability estimation unit 106 may acquire a previously given probability distribution by reading from a predetermined storage device or the like. In such a case, the contents of the auxiliary data side probability estimation process are not updated by learning. Therefore, the learned auxiliary data side probability estimation process is the same as the auxiliary data side probability estimation process before learning.

It should be noted that updating the contents of the auxiliary data side probability estimation process is, more specifically, updating the parameterized auxiliary feature quantity cumulative distribution function h. Therefore, when the auxiliary data side probability estimating unit 106 acquires a previously given probability distribution by reading from a predetermined storage device or the like, the trained parameterized auxiliary feature quantity cumulative distribution function h is the parameterized auxiliary feature before learning. It is the same as the quantity cumulative distribution function h.

In this way, the learning unit 10 updates the encoding and decoding processes in the self-encoding process using vector quantization through learning. The self-encoding process using vector quantization uses a main data feature quantity, which is a feature quantity to be self-encoded, and an auxiliary feature quantity, which is a feature quantity of the main data feature quantity. Furthermore, in the self-encoding process using vector quantization, entropy encoding is performed on the result of vector quantization of the main data feature quantity and entropy encoding is performed on the result of scalar quantization of the auxiliary feature quantity.

Specifically, the encoding process in the self-encoding process using such vector quantization includes the main data feature amount acquisition process executed by the main data side encoding unit 102, the auxiliary data side encoding unit 103 executes auxiliary feature amount acquisition processing. Further, the decoding process in the self-encoding process using such vector quantization specifically includes the auxiliary feature amount decoding process executed by the auxiliary data side decoding unit 107 and the main data side decoding unit 111 and main data feature amount decoding processing to be executed.

Self-encoding using vector quantization using learned peripheral processing will be described with reference to FIGS. 8 and 9. FIG. "Learning completed" means the time when the learning end condition is satisfied. More specifically, as an example of a device that performs self-encoding using vector quantization using learned peripheral processing, the auto-encoding device 2 that performs encoding and decoding will be described. The self-encoding device 2 is a type of self-encoder (autoencoder).

FIG. 8 is a first explanatory diagram for explaining the outline of the self-encoding device 2 in the embodiment. FIG. 9 is a second explanatory diagram for explaining the outline of the self-encoding device 2 in the embodiment. More specifically, FIG. 8 is an explanatory diagram for explaining the encoding process executed by the self-encoding device 2, and FIG. 9 is an explanatory diagram for explaining the decoding process executed by the self-encoding device 2. is.

The self-encoding device 2 is a kind of self-encoder, so it has an encoder and a decoder. Specifically, the autoencoding device 2 comprises an encoder 200 and a decoder 212 . The encoder 200 includes a self-encoding target acquisition unit 201, a learned main data side encoding unit 202, a learned auxiliary data side encoding unit 203, a vector quantization unit 204, a scalar quantization unit 205, and a learned auxiliary data side probability. It comprises an estimating section 206 , a learned auxiliary data side decoding section 207 , an auxiliary entropy coding section 208 , a main data side probability estimating section 209 , a main entropy coding section 210 and a data multiplexing section 211 .

The decoder 212 includes an encoded data acquisition unit 213, a data separation unit 214, an auxiliary entropy decoding unit 215, a trained auxiliary data side decoding unit 216, a main entropy decoding unit 217, and a trained main data side decoding unit 218.

The self-encoding target acquisition unit 201 acquires data to be self-encoded as main data. An object of self-encoding is hereinafter referred to as a self-encoding object.

The learned main data side encoding unit 202 executes a learned main data feature amount acquisition process for the self-encoding target. By executing the learned main data feature amount acquisition process, the learned main data side encoding unit 202 acquires the main data feature amount to be self-encoded.

The learned auxiliary data side encoding unit 203 executes learned auxiliary feature amount acquisition processing on the main data feature amount to be self-encoded. By executing the learned auxiliary feature amount acquisition process, the learned auxiliary data side encoding unit 203 acquires the auxiliary feature amount to be self-encoded.

The vector quantization unit 204 executes vector quantization processing on the main data feature quantity to be self-encoded. By executing the vector quantization process, the vector quantization unit 204 acquires the vector-quantized main data feature amount (hereinafter referred to as "vector quantized feature amount") to be self-encoded.

The scalar quantization unit 205 performs scalar quantization processing on the auxiliary feature quantity to be auto-encoded. By executing the scalar quantization process, the scalar quantization unit 205 acquires a scalar-quantized auxiliary feature amount to be self-encoded (hereinafter referred to as “scalar quantized feature amount”).

The learned auxiliary data side probability estimation unit 206 executes the learned auxiliary data side probability estimation process. The learned auxiliary data-side probability estimating unit 206 estimates the auxiliary data-side probability of the self-encoding target based on the scalar quantized feature value by executing the learned auxiliary data-side probability estimation process.

The learned auxiliary data side decoding unit 207 executes learned auxiliary feature quantity decoding processing on the scalar quantized feature quantity. That is, the learned auxiliary data side decoding unit 207 decodes the scalar quantized feature quantity. Hereinafter, the information obtained by decoding the scalar quantized feature quantity will be referred to as quantized auxiliary data. Therefore, the learned auxiliary data side decoding unit 207 is a process of obtaining quantized auxiliary data by decoding the scalar quantized feature amount.

The auxiliary entropy encoding unit 208 entropy-encodes the scalar quantized feature amount based on the scalar quantized feature amount and the probability of the auxiliary data to be self-encoded. Entropy coding is, for example, arithmetic coding.

The main-data-side probability estimation unit 209 estimates the main-data-side probability of the self-encoding target based on the vector quantized feature amount and the quantized auxiliary data.

The primary entropy encoding unit 210 performs entropy encoding of the vector quantized feature quantity based on the vector quantized feature quantity and the probability of the main data to be self-encoded. Entropy coding is, for example, arithmetic coding.

The data multiplexing unit 211 outputs the entropy-encoded vector quantized feature amount and the entropy-encoded scalar quantized feature amount to the decoder 212 . In this manner, encoder 200 encodes the self-encoding object.

The encoded data acquisition unit 213 acquires the entropy-encoded vector quantized feature amount and the entropy-encoded scalar quantized feature amount.

The data separation unit 214 acquires the entropy-encoded vector quantized feature amount and the entropy-encoded scalar quantized feature amount acquired by the encoded data acquisition unit 213 . The data separation unit 214 outputs the entropy-encoded scalar quantized feature amount to the auxiliary entropy decoding unit 215 and outputs the entropy-encoded vector quantized feature amount to the primary entropy decoding unit 217 .

The trained auxiliary entropy decoding unit 215 performs entropy decoding on the entropy-encoded scalar quantized feature using the trained parameterized auxiliary feature cumulative distribution function.

The learned auxiliary data side decoding unit 216 executes the learned auxiliary feature quantity decoding process on the result of entropy decoding by the trained auxiliary entropy decoding unit 215 .

The primary entropy decoding unit 217 entropy-encoded vector quantization based on the entropy-encoded vector quantized feature amount and the result of the learned auxiliary feature amount decoding process by the learned auxiliary data side decoding unit 216. entropy coding of the modified features. More specifically, the primary entropy decoding unit 217 performs entropy decoding on the entropy-encoded vector quantized feature using the decoded cumulative distribution function. The decoded cumulative distribution function is a parameterized main data feature quantity cumulative distribution function whose parameter values are values indicated by the result of learned auxiliary feature quantity decoding processing by the learned auxiliary data side decoding unit 216 .

The learned main data side decoding unit 218 executes the learned main data feature value decoding process on the result of decoding by the main entropy decoding unit 217 .

Thus, the decoder 212 decodes the self-encoded object encoded by the encoder 200. Also, in this manner, the self-encoding device 2 self-encodes the self-encoding target.

FIG. 10 is a flow chart showing an example of the flow of processing executed by the encoder 200 in the embodiment. The self-encoding target acquiring unit 201 acquires the self-encoding target X=[X ₁ , X ₂ , . . . , X _N ] (step S201). The autoencoding object X is an N-dimensional vector with N elements from _X1 to _XN . Each element from X ₁ to X _N is a tensor. Therefore, each element of X ₁ through X _N may be a scalar or a vector.

Next, the learned main data side encoding unit 202 executes a learned main data feature amount acquisition process (step S202). That is, the learned main data side encoding unit 202 encodes the self-encoding target X. FIG. The encoding of the self-encoding target yields the main data feature Y=F _enc (Y) of the self-encoding target. The function F _enc (X) is the learned primary data encoding function. Note that the main data feature quantity Y is a tensor such as a vector.

Next, the learned auxiliary data side encoding unit 203 executes a learned auxiliary feature amount acquisition process (step S203). By executing the learned auxiliary feature amount acquisition process, the auxiliary feature amount Z=G _enc (Y) to be self-encoded is obtained. The function G _enc (Y) is a learned main data feature quantity encoding function. The auxiliary feature Z is a tensor such as a vector.

Next, the vector quantization unit 204 performs vector quantization on the main data feature quantity Y to be self-encoded (step S204). By performing vector quantization, _the vector quantized feature Y _^ =[Y ₁ ^, Y ₂ _^ , . Q(Y _k )] is obtained. Y _i represents the i-th element of the main data feature quantity Y to be auto-encoded. _Ŷi represents the i-th element of the vector quantized feature Ŷ. Q is a function represented by Equation (11) below.

The symbol Λ means the set of all lattice points.

Next, the scalar quantization unit 205 performs scalar quantization on the auxiliary feature Z to be self-encoded (step S205). By executing the scalar quantization process, the scalar quantization unit 205 acquires a scalar quantized feature Z^=round(Z). Note that round represents rounding processing.

Next, the learned auxiliary data side probability estimation unit 206 executes the learned auxiliary data side probability estimation process (step S206). The learned auxiliary data-side probability estimation unit 206 estimates the auxiliary data-side probability of the self-encoding target based on the scalar quantized feature value Z^ by executing the learned auxiliary data-side probability estimation process. Specifically, the auxiliary data side probability to be self-encoded is represented by the following equation (12).

The symbol on the left side of Equation (12) represents the auxiliary data side probability of the self-encoding target. The symbol H is a trained parameterized auxiliary feature cumulative distribution function.

The learned auxiliary data side decoding unit 207 executes the learned auxiliary feature quantity decoding process on the scalar quantized feature quantity Ẑ (step S207). That is, the trained auxiliary data side decoding unit 207 obtains the quantized auxiliary data Θ=G _dec (Ẑ) by decoding the scalar quantized feature quantity. The function G _dec (Ẑ) is a learned auxiliary feature decoding function. The quantization auxiliary data Θ is a tensor such as a vector.

Next, the auxiliary entropy encoding unit 208 entropy-encodes the scalar quantized feature Z^ based on the scalar quantized feature Z^ and the probability of the auxiliary data to be self-encoded (step S208).

Next, the main-data-side probability estimating unit 209 estimates the main-data-side probability of the self-encoding target based on the vector quantized feature Y^ and the quantized auxiliary data Θ (step S209).

Next, the main entropy encoding unit 210 entropy-encodes the vector quantized feature Y^ based on the vector quantized feature Y^ and the probability of the main data to be self-encoded (step S210).

Next, the data multiplexing unit 211 outputs the entropy-encoded vector quantized feature amount and the entropy-encoded scalar quantized feature amount to the decoder 212 (step S211).

A series of processing from step S201 to step S211 is an example of encoding processing by the encoder 200. Note that each process from step S201 to step S211 may be executed in any order as long as it does not violate causality.

FIG. 11 is a flowchart showing an example of the flow of processing executed by the decoder 212 in the embodiment. The encoded data acquisition unit 213 acquires the result of encoding by the encoder 200 (step S301). Specifically, the results of encoding by the encoder 200 are the entropy-encoded vector quantized feature amount output in step S211 and the entropy-encoded scalar quantized feature amount.

Next, the data separation unit 214 separates the entropy-encoded scalar quantized feature quantity acquired in step S301 from the entropy-encoded vector quantized feature quantity acquired in step S302 (step S302). . Specifically, the separation means that the entropy-encoded scalar quantized feature obtained in step S301 is output to the auxiliary entropy decoding unit 215, and the entropy-encoded vector quantized feature obtained in step S302 is output to the auxiliary entropy decoding unit 215. It means outputting the amount to the principal entropy decoding unit 217 .

Next, the trained auxiliary entropy decoding unit 215 performs entropy decoding on the entropy-encoded scalar quantized feature using the trained parameterized auxiliary feature cumulative distribution function (step S303).

Next, the learned auxiliary data side decoding unit 216 executes the learned auxiliary feature quantity decoding process on the result of entropy decoding by the trained auxiliary entropy decoding unit 215 (step S304).

Next, the primary entropy decoding unit 217 performs entropy decoding on the entropy-encoded vector quantized feature using the decoded cumulative distribution function (step S305).

Next, the learned main data side decoding unit 218 executes the learned main data feature quantity decoding process on the result of decoding by the main entropy decoding unit 217 (step S306.

A series of processing from step S301 to step S306 is an example of decoding processing by the decoder 212. It should be noted that each process from step S301 to step S306 is executed after the encoding process by the encoder 200 such as step S211 is executed, and may be executed in any order as long as it does not violate causality.

<Description of hardware>
FIG. 12 is a diagram showing an example of the hardware configuration of the learning device 1 according to the embodiment. The learning device 1 includes a control unit 11 including a processor 91 such as a CPU (Central Processing Unit) connected via a bus and a memory 92, and executes a program. The learning device 1 functions as a device including a control unit 11, an input unit 12, a communication unit 13, a storage unit 14, and an output unit 15 by executing a program.

More specifically, the processor 91 reads the program stored in the storage unit 14 and stores the read program in the memory 92 . The processor 91 executes the program stored in the memory 92 , whereby the learning device 1 functions as a device comprising the control section 11 , the input section 12 , the communication section 13 , the storage section 14 and the output section 15 .

The control unit 11 controls the operations of various functional units included in the learning device 1. The control unit 11 controls the operation of the output unit 15, for example. The control unit 11 records, for example, various information generated by learning in the storage unit 14 .

The input unit 12 includes input devices such as a mouse, keyboard, and touch panel. The input unit 12 may be configured as an interface that connects these input devices to the learning device 1 . The input unit 12 receives input of various information to the learning device 1 .

The communication unit 13 includes a communication interface for connecting the learning device 1 to an external device. The communication unit 13 communicates with an external device via wire or wireless. The external device is, for example, a device that transmits main data used for learning. The communication unit 13 acquires main data used for learning through communication with the device that is the transmission source of the main data. The external device is for example the autoencoding device 2 . The communication unit 13 transmits the network learning result to the self-encoding device 2 through communication with the self-encoding device 2 . Note that the main data does not necessarily have to be input via the communication unit 13 and may be input to the input unit 12 .

The storage unit 14 is configured using a computer-readable storage medium device such as a magnetic hard disk device or a semiconductor storage device. The storage unit 14 stores various information regarding the learning device 1 . The storage unit 14 stores information input via the input unit 12 or the communication unit 13, for example. The storage unit 14 stores, for example, various information generated by execution of learning.

The storage unit 14 pre-stores, for example, probability distributions used to acquire the occurrence probability of each element of the tensor indicating the auxiliary feature amount. The storage unit 14 stores, for example, a parameterized auxiliary feature cumulative distribution function in advance. The storage unit 14 stores in advance, for example, the parameterized main data feature quantity cumulative distribution function. The storage unit 14 stores, for example, representative vector information in advance. The storage unit 14 stores, for example, the result of hyperrectangular parallelepiped division.

The storage unit 14 stores, for example, the initial values of the parameters of the learning network 100 in advance. The initial value is, for example, a random value. The storage unit 14 stores, for example, network learning results.

The output unit 15 outputs various information. The output unit 15 includes a display device such as a CRT (Cathode Ray Tube) display, a liquid crystal display, an organic EL (Electro-Luminescence) display, or the like. The output unit 15 may be configured as an interface that connects these display devices to the study device 1 . The output unit 15 outputs information input to the input unit 12, for example. The output unit 15 may display the result of learning, for example.

FIG. 13 is a diagram showing an example of the configuration of the control unit 11 included in the learning device 1 according to the embodiment. The control unit 11 includes a learning unit 10 , a memory control unit 120 , a communication control unit 130 and an output control unit 140 . The storage control unit 120 records various information in the storage unit 14 . The communication control section 130 controls the operation of the communication section 13 . The output control section 140 controls the operation of the output section 15 .

FIG. 14 is a diagram showing an example of the hardware configuration of the self-encoding device 2 in the embodiment. The self-encoding device 2 includes a control section 21 including a processor 93 such as a CPU (Central Processing Unit) connected via a bus and a memory 94, and executes a program. The self-encoding device 2 functions as a device comprising a control section 21, an input section 22, a communication section 23, a storage section 24 and an output section 25 by executing a program.

More specifically, the processor 93 reads the program stored in the storage unit 24 and stores the read program in the memory 94 . The processor 93 executes the program stored in the memory 94 so that the self-encoding device 2 functions as a device comprising the control section 21 , the input section 22 , the communication section 23 , the storage section 24 and the output section 25 .

The control unit 21 controls operations of various functional units provided in the self-encoding device 2 . The control unit 21 controls the operation of the output unit 25, for example. The control unit 21 records various information generated by encoding by the encoder 200 and decoding by the decoder 212 in the storage unit 24, for example.

The input unit 22 includes input devices such as a mouse, keyboard, and touch panel. The input unit 22 may be configured as an interface connecting these input devices to the autoencoding device 2 . The input unit 22 receives input of various information to the self-encoding device 2 .

The communication unit 23 includes a communication interface for connecting the self-encoding device 2 to an external device. The communication unit 23 communicates with an external device via wire or wireless. The external device is, for example, the device that is the source of the self-encoding. The communication unit 23 acquires the self-encoding target through communication with the device that is the transmission source of the self-encoding target. The external device is the learning device 1, for example. The communication unit 23 receives network learning results through communication with the learning device 1 . Note that the self-encoding target does not necessarily have to be input via the communication unit 23 and may be input to the input unit 22 .

The storage unit 24 is configured using a computer-readable storage medium device such as a magnetic hard disk device or a semiconductor storage device. A storage unit 24 stores various information about the self-encoding device 2 . The storage unit 24 stores information input via the input unit 22 or the communication unit 23, for example. The storage unit 24 stores various kinds of information generated by executing encoding by the encoder 200 and decoding by the decoder 212, for example.

The storage unit 24 stores, for example, network learning results. The storage unit 24 stores, for example, representative vector information in advance. The storage unit 24 stores, for example, the result of hyperrectangular parallelepiped division.

The output unit 25 outputs various information. The output unit 25 includes a display device such as a CRT (Cathode Ray Tube) display, a liquid crystal display, an organic EL (Electro-Luminescence) display, or the like. The output unit 25 may be configured as an interface that connects these display devices to the study device 1 . The output unit 25 outputs information input to the input unit 22, for example. The output unit 25 may output, for example, the result of self-encoding of the object to be self-encoded.

FIG. 15 is a diagram showing an example of the configuration of the control unit 21 included in the self-encoding device 2 according to the embodiment. The control unit 21 includes a self-encoding execution unit 20 , a memory control unit 220 , a communication control unit 230 and an output control unit 240 . The self-encoding execution unit 20 performs self-encoding on a self-encoding target. The autoencoding execution unit 20 comprises an encoder 200 and a decoder 212 . The self-encoding execution unit 20 carries out encoding by the encoder 200 and decoding by the decoder 212 to self-encode the object to be self-encoded.

The storage control unit 220 records various information in the storage unit 24. A communication control unit 230 controls the operation of the communication unit 23 . The output control section 240 controls the operation of the output section 25 .

The learning device 1 configured in this way learns peripheral processing for vector quantization using representative vector information, which is information indicating the positions of representative vectors arranged in a lattice in a vector space such as LatticeVQ. . Then, the learning device 1 estimates the occurrence probability of the representative vector using the result of the hyperrectangular parallelepiped partitioning. As described above, when representative vectors are used, Voronoi division may be used to estimate the probability of occurrence of representative vectors, but integration is not easy.

Therefore, the learning device 1 that estimates the probability of occurrence of a representative vector using a hyperrectangular parallelepiped obtained by hyperrectangular parallelepiped partitioning can reduce the load required to obtain self-encoding processing using vector quantization. . This is to reduce the burden from the learning stage required until the realization of self-encoding using vector quantization. Therefore, the learning device 1 can reduce the load required for self-encoding using vector quantization.

In addition, since the learning device 1 configured in this way uses representative vectors to learn peripheral processing for vector quantization, it is possible to reduce the burden required for learning representative vectors. In addition, since the learning device 1 configured in this way learns the peripheral processing for vector quantization using the representative vector, there is no need to use a memory when learning the representative vector. Therefore, the learning device 1 can reduce the frequency of memory shortage problems and can process main data of a larger dimension.

By the way, as mentioned above, in learning vector quantization, it is necessary to add noise instead of quantization. However, when representative vector information such as LatticeVQ is used, the range of the Voronoi region becomes a quantization error, so it is not easy to generate noise whose probability distribution is a Gaussian distribution.

On the other hand, the learning device 1 configured as described above, in learning, among the samples randomly generated in the (K−1)-dimensional sphere circumscribing the Voronoi region in the K-dimensional vector space, the samples in the Voronoi region is used as noise. Therefore, it is possible to generate noise following a Gaussian distribution. Therefore, the learning device 1 can reduce the burden required to obtain self-encoding processing using vector quantization. Therefore, the learning device 1 can reduce the load required for self-encoding using vector quantization.

The self-encoding device 2 configured in this way performs self-encoding using vector quantization using the learning result of the learning device 1 . Therefore, the load required for self-encoding using vector quantization can be reduced.

(Modification)
In the addition of noise to the main data features, even if samples uniformly generated in a (K-1)-dimensional sphere whose volume is approximated to the volume of the Voronoi region in a K-dimensional vector space are used as noise points. good. In addition of noise to the main data feature amount, samples uniformly generated within the hyperrectangular parallelepiped obtained by hyperrectangular parallelepiped division may be used as noise points.

Therefore, the process of adding noise to the main data feature amount (that is, the vector noise adding process) may be any one of the first noise adding process, the second noise adding process, or the third noise adding process. The first noise adding process is a process of adding, as noise, samples in the Voronoi region among the samples randomly generated in the (K−1)-dimensional sphere circumscribing the Voronoi region in the K-dimensional vector space. That is, the first noise adding process is the process described with reference to FIGS. 3 and 4. FIG.

The second noise addition process is a process of adding, as noise, samples uniformly generated within a (K-1)-dimensional sphere whose volume is approximated to the volume of a Voronoi region in a K-dimensional vector space. In the third noise addition process, noise is uniformly generated in a hyperrectangular parallelepiped region that is a region that divides a vector space in which representative vectors are arranged in a grid pattern, that is a region that includes one grid point of the vector space, and that has a hyperrectangular parallelepiped shape. This is a process to assign the samples obtained as noise.

It should be noted that the learning device 1 may be implemented using a plurality of information processing devices that are communicably connected via a network. In this case, each functional unit included in the learning device 1 may be distributed and implemented in a plurality of information processing devices.

Note that the self-encoding device 2 may be implemented using a plurality of information processing devices communicatively connected via a network. In this case, each functional unit included in the self-encoding device 2 may be distributed and implemented in a plurality of information processing devices.

All or part of the functions of the learning device 1 and the self-encoding device 2 are hardware such as ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), and FPGA (Field Programmable Gate Array). may be implemented using The program may be recorded on a computer-readable recording medium. Computer-readable recording media include portable media such as flexible disks, magneto-optical disks, ROMs and CD-ROMs, and storage devices such as hard disks incorporated in computer systems. The program may be transmitted over telecommunications lines.

Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes design within the scope of the gist of the present invention.

1... learning device, 10... learning unit, 100... learning network, 101... main data acquisition unit, 102... main data side encoding unit, 103... auxiliary data side encoding unit, 104... main data side noise adding unit, 105 ... auxiliary data side noise addition section 106 ... auxiliary data side probability estimation section 107 ... auxiliary data side decoding section 108 ... auxiliary entropy acquisition section 109 ... main data side probability estimation section 110 ... main entropy acquisition section 111 ... Main data side decoding unit 112... Reconstruction error calculation unit 113... Optimization unit 2... Self encoding device 200... Encoder 201... Self encoding target acquisition unit 202... Learned main data side encoding unit 203... Learned auxiliary data side encoding unit 204... Vector quantization unit 205... Scalar quantization unit 206... Learned auxiliary data side probability estimation unit 207... Learned auxiliary data side decoding unit 208... Assistant Entropy encoding unit 209... Main data side probability estimation unit 210... Main entropy encoding unit 211... Data multiplexing unit 213... Encoded data acquisition unit 214... Data separation unit 215... Auxiliary entropy decoding unit 216... Learned auxiliary data side decoding unit 217... Main entropy decoding unit 218... Learned main data side decoding unit 11... Control unit 12... Input unit 13... Communication unit 14... Storage unit 15... Output Part 120... Memory control part 130... Communication control part 140... Output control part 21... Control part 22... Input part 23... Communication part 24... Storage part 25... Output part 20... Self-encoding Execution unit 220 Storage control unit 230 Communication control unit 240 Output control unit 91 Processor 92 Memory 93 Processor 94 Memory

Claims

A process of self-encoding using vector quantization, which is self-encoding using a main data feature amount that is a feature amount to be self-encoded and an auxiliary feature amount that is a feature amount of the main data feature amount. a self-encoding process in which entropy coding is performed on the result of vector quantization of the main data feature amount and entropy coding is performed on the result of scalar quantization of the auxiliary feature amount. a learning unit that updates processing by learning;
with
In the learning, the learning unit executes main data side probability estimation processing for estimating the occurrence probability of each element of the tensor indicating the main data feature amount,
The main data side probability estimation process integrates a region that is a region that divides a vector space in which representative vectors are arranged in a grid pattern, that is a region that includes one grid point of the vector space, and that has a hyperrectangular shape. estimating the probability of occurrence using the result of integrating a probability density function parameterized as a region;
learning device.
The learning unit updates the encoding and decoding processes using the entropy of the main data feature amount obtained based on the occurrence probability estimated by the main data side probability estimation process.
A learning device according to claim 1.
the vector quantization is LatticeVQ;
3. The learning device according to claim 1 or 2.
a self-encoding target acquisition unit that acquires a self-encoding target;
A process of self-encoding using vector quantization, which is self-encoding using a main data feature amount that is a feature amount to be self-encoded and an auxiliary feature amount that is a feature amount of the main data feature amount. a self-encoding process in which entropy coding is performed on the result of vector quantization of the main data feature amount and entropy coding is performed on the result of scalar quantization of the auxiliary feature amount. a learning unit that updates processing by learning, wherein the learning unit executes main data side probability estimation processing for estimating the occurrence probability of each element of a tensor representing the main data feature amount in the learning, and In the side probability estimation process, a region that divides a vector space in which representative vectors are arranged in a grid pattern, that is a region that includes one grid point of the vector space, and that has a hyperrectangular parallelepiped shape is parameterized as an integration region. Using learned encoding processing and learned decoding processing obtained using a learning device that estimates the occurrence probability using the result of integration of the probability density function obtained, a self-encoding execution unit that performs self-encoding by vector quantization of the target acquired by the self-encoding target acquisition unit;
An autoencoding device comprising:
A process of self-encoding using vector quantization, which is self-encoding using a main data feature amount that is a feature amount to be self-encoded and an auxiliary feature amount that is a feature amount of the main data feature amount. a self-encoding process in which entropy coding is performed on the result of vector quantization of the main data feature amount and entropy coding is performed on the result of scalar quantization of the auxiliary feature amount. a learning step that updates the process by learning;
has
The learning step executes main data side probability estimation processing for estimating the occurrence probability of each element of the tensor indicating the main data feature amount in the learning,
The main data side probability estimation process integrates a region that is a region that divides a vector space in which representative vectors are arranged in a grid pattern, that is a region that includes one grid point of the vector space, and that has a hyperrectangular shape. estimating the probability of occurrence using the result of integrating a probability density function parameterized as a region;
learning method.
a self-encoding target acquisition step for acquiring a self-encoding target;
A process of self-encoding using vector quantization, which is self-encoding using a main data feature amount that is a feature amount to be self-encoded and an auxiliary feature amount that is a feature amount of the main data feature amount. a self-encoding process in which entropy coding is performed on the result of vector quantization of the main data feature amount and entropy coding is performed on the result of scalar quantization of the auxiliary feature amount. a learning unit that updates processing by learning, wherein the learning unit executes main data side probability estimation processing for estimating the occurrence probability of each element of a tensor representing the main data feature amount in the learning, and In the side probability estimation process, a region that divides a vector space in which representative vectors are arranged in a grid pattern, that is a region that includes one grid point of the vector space, and that has a hyperrectangular parallelepiped shape is parameterized as an integration region. Using learned encoding processing and learned decoding processing obtained using a learning device that estimates the occurrence probability using the result of integration of the probability density function obtained, a self-encoding execution step of performing self-encoding by vector quantization of the target acquired by the self-encoding target acquiring step;
A self-encoding method with
A program for causing a computer to function as the learning device according to any one of claims 1 to 3.
A program for causing a computer to function as the self-encoding device according to claim 4.