WO2021140639A1

WO2021140639A1 - Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor

Info

Publication number: WO2021140639A1
Application number: PCT/JP2020/000629
Authority: WO
Inventors: 耀一佐々木; 康佑秋元
Original assignee: 日本電気株式会社
Priority date: 2020-01-10
Filing date: 2020-01-10
Publication date: 2021-07-15

Abstract

Conventional devices have had a problem in that it is difficult to perform prediction processing without decompressing compressed data. This information processing device includes: a parameter database (D2) which stores at least a coupler parameter, which is an action setting value for a coupler used in calculating a vector value corresponding to a character string included in compressed data (D1) generated by the implementation of grammatical compression, and a predictor parameter; and a prediction processing unit (11) that interprets a grammatical compression rule from the compressed data (D1), uses the grammatical compression rule and the coupler parameter to generate a character string feature vector corresponding to the character string included in the compressed data (D1), and uses the predictor parameter to output a predicted value for the character string from the character string feature vector.

Description

Information processing device, method of predicting character strings after grammar-based code, and computer-readable medium in which the program is stored.

The present invention relates to an information processing device, a method for predicting a character string after grammatical compression, and a computer-readable medium in which the program is stored. In particular, grammatically compressed character string data is input and included in the character string data. It relates to an information processing device for predicting a character string, a method for predicting a character string after grammatical compression, and a computer-readable medium in which the program is stored.

In recent years, big data has been compressed and stored in order to suppress the huge amount of data to the smallest possible storage capacity. For such compressed data, it is necessary to perform prediction processing after decompressing the data in order to recognize the contents existing in the data. However, when the prediction processing for such compressed data is performed, there arises a problem that the preprocessing time for decompression increases in proportion to the data capacity. Therefore, Patent Document 1 discloses an example in which compression processing is performed on a series to generate compressed data and learning processing using the compressed data is performed, and machine learning using image data compressed in Non-Patent Document 1 is disclosed. The task is disclosed.

More specifically, in Patent Document 1, the compression unit generates a context-free grammar in which the input symbol sequence is compressed, and the model generation unit generates a syntax tree representing the context-free grammar and model parameters. The initial value is set to, and the learning unit corresponds to the non-terminal symbol for each non-terminal symbol based on the model parameter and the syntax tree, and the topic is assigned to the latent variable. From, the inner probability, which is the probability that a terminal symbol is obtained, is calculated by tracing the syntax tree, and for each non-terminal symbol, the number of times a topic is assigned to the latent variable of the inner point corresponding to the non-terminal symbol is expected for each topic. The outer expected value, which is a value, is calculated, and the model parameters are updated based on the calculated inner probability and the outer expected value, and this process is repeated until a predetermined termination condition is satisfied.

Further, in Non-Patent Document 1, for JPEG data in which an image is irreversibly compressed, a neural network is used based on the Y, Cb, and Cr representations in the process of returning the uncompressed image to RGB without decompressing all the data. The machine learning task to be input is disclosed.

Japanese Unexamined Patent Publication No. 2016-212742

However, when the character string data is grammatically compressed, the compression method is different from that of the PEG data. Therefore, in the proposed method described in Non-Patent Document 1, even if the grammatically compressed character string data is input to the input of the neural network, I can't solve the problem. Further, in Patent Document 1, although the symbol sequence is the target of processing, the context-free grammar generated by the compression unit must be used, and there is a problem that processing cannot be performed with compressed data compressed by a general compression method. There is.

One aspect of the information processing apparatus according to the present invention is a coupler parameter, which is an operation setting value of a coupler used for calculating a vector value corresponding to a character string included in a compressed data generated by performing grammatical compression, and a prediction. A parameter database that stores at least device parameters and a character string feature vector that interprets the grammatical compression rules from the compressed data and uses the grammatical compression rules and the combiner parameters to correspond to the character strings contained in the compressed data. Is generated, and has a prediction processing unit that outputs the predicted value of the character string from the character string feature vector using the predictor parameter.

One aspect of the method for predicting a character string after grammatical compression according to the present invention is to predict the character string included in the compressed data generated by performing grammatical compression processing on the character string using a computer. In the character string prediction method of, the compressed data generated by applying grammatical compression to the character string is input, the rule of the grammatical compression is interpreted from the compressed data, and the characters included in the compressed data. A character string feature vector is applied to a column to generate a character string feature vector, and the predicted value of the character string indicated by the character string feature vector is output using the predictor parameters.

One aspect of the computer-readable medium in which the program according to the present invention is stored is a program that predicts the character string included in the compressed data generated by performing grammatical compression processing on the character string by using a calculation function of the computer. Is a computer-readable medium in which is stored, and the program reads compressed data generated by performing grammatical compression on a character string, interprets the grammatical compression rules from the compressed data, and performs the compressed data. A character string feature vector is applied to the character string included in the character string feature vector to generate a character string feature vector, and the predicted value of the character string indicated by the character string feature vector is output using the predictor parameter.

According to the information processing apparatus according to the present invention, the method of predicting the character string after grammar-based compression, and the computer-readable medium in which the program is stored, the grammar-based character string data is not decompressed by the predictor. Can be given.

It is a block diagram of the information processing apparatus which concerns on Embodiment 1. FIG. It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 1. FIG. It is a block diagram of the information processing apparatus which concerns on Embodiment 2. FIG. It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 2. FIG. It is a block diagram of the information processing apparatus which concerns on Embodiment 3. FIG. It is a block diagram of the information processing apparatus which concerns on Embodiment 4. FIG. It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 4. FIG.

Embodiment 1
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 shows a block diagram of the information processing apparatus according to the first embodiment. As shown in FIG. 1, the information processing apparatus 1 according to the first embodiment includes an input unit 10, a prediction processing unit 11, an output unit 12, and a parameter database D2. Further, in the information processing apparatus 1, compressed data (for example, grammar-based compressed character string D1) is input from another database or from the outside.

The information processing device 1 is, for example, an arithmetic unit such as a computer, and the functions of the input unit 10, the prediction processing unit 11, and the output unit 12 are realized by executing a program. The information processing device 1 may be connected to another computer or a database via a network, and may operate independently without using a network.

The input unit 10 reads the grammar-based compressed character string D1 from a storage device (not shown). As this storage device, a hard disk, SSD (Solid State Disk), non-volatile memory and the like can be considered. The output unit 12 outputs the predicted value output by the prediction processing unit 11 to another storage device. The parameter database D2 stores the character feature vector, the combiner parameter, and the predictor parameter used in the prediction processing unit 11.

The prediction processing unit 11 predicts the character string included in the grammar-based compressed character string D1 read by the input unit 10. At this time, the prediction processing unit 11 has one of the features of predicting the character string included in the grammar-based compressed character string D1 without decompressing the grammar-based compressed character string D1. More specifically, the prediction processing unit 11 interprets the rules of grammar-based compression from the grammar-based compression character string D1 using the combiner parameter, and generates a character string feature vector corresponding to the character string included in the compressed data. Then, the predicted value of the character string is output from the character string feature vector using the predictor parameter.

The prediction processing unit 11 has a character string feature vector generation unit 20 and a prediction unit 21. The character string feature vector generation unit 20 applies the character feature vector stored in the parameter database D2 to the grammar-based character string D1 to generate the character string feature vector. At this time, the character string feature vector generation unit 20 selects a character feature vector to be applied to the characters constituting the character string included in the grammar-compressed character string D1 based on the grammar-based compression rule of the grammar-compressed character string D1. The prediction unit 21 is, for example, an arithmetic unit using a neural network technique such as RSTM (Long Short Term Memory). As the prediction unit 21, not only recurrent neural networks such as LSTM but also various types of neural networks such as feedforward neural networks and convolutional neural networks can be applied.

Here, the prediction processing for the grammar-based compressed character string D1 and the grammar-based compressed character string D1 will be described. Therefore, FIG. 2 shows a diagram illustrating one specific example of processing in the information processing apparatus according to the first embodiment.

In the example shown in FIG. 2, the character string "ATTTTTTTTTCGA" is shown as the character string to be compressed. This string is, for example, something that is used as part of a base sequence. However, the character string may be not only a base sequence but also a general language such as a word or a sentence.

Further, the grammar-based compression character string D1 is a character string that has undergone grammar-based compression processing. As this grammar compression, methods such as LZ78, LXW, and LZD are known. In the example shown in FIG. 2, the target character string is grammatically compressed by LZ78 with respect to the character string. As shown in FIG. 2, in the LZ78, the factor is set for each continuous character string in which one character is added to the portion where the characters are continuous in the order already mentioned in the combination of consecutive characters.

In the example shown in FIG. 2, "A" is assigned to factor f1, "T" is assigned to factor f2, "TT" is assigned to factor f3, "TTT" is assigned to factor f4, and "TTC" is assigned to factor f5. Allocation, "G" was assigned to factor f6, and "A" was assigned to factor f7. Further, in the LZ78, the factor f0 is set as an empty string having a length of 0.

Then, in LZ78, if the character string belonging to the target factor is the first character to appear, or if the character string belonging to the target factor is not assigned to another factor as a continuous character string, the factor f0 and the target are used. Set the combination with the characters belonging to the factor. In the example shown in FIG. 2, the factors f1, f2, f6, and f7 are replaced with a character string that is a combination of the factor f0 and the characters included in the target factor.

On the other hand, in the example shown in FIG. 2, since the factor f3 is a combination of the factor f2 and the character T, it is replaced with the character strings of the factors f2 and T after compression. Since the factor f4 is a combination of the factor f3 and the character T, it is replaced with the character string of the factors f3 and T after compression. Since the factor f5 is a combination of the factor f3 and the character C, it is replaced with the character strings of the factors f3 and C after compression.

In this way, in grammar compression, factors are assigned to combinations of consecutive character strings that appear, and factors that appear more than once are replaced with character strings that combine factors and characters. This compressed character string is called a tuple string. In the information processing apparatus 1 according to the first embodiment, this tuple string is input to the prediction processing unit 11 as a grammar-based compressed character string D1.

Then, in the information processing apparatus 1 according to the first embodiment, the character string feature vector generation unit 20 applies the character feature vector stored in the parameter database D2 based on the grammar compression rule of the grammar-based code string D1 to characterize the character. Generate a column feature vector. In the character string feature vector generation unit 20, the character string feature vector vf1 to vf7, which is a character string component of the character string feature vector corresponding to the factors f1 to f7, is combined by using the combiner M. To generate. Then, the information processing device 1 combines the character string feature vectors vf1 to vf7 and inputs them to the prediction unit 21.

In the example shown in FIG. 2, since the tuple string corresponding to the factor f1 is (f0, A) in the character string feature vector generation unit 20, the character string using the character feature vector v (A) corresponding to the character A is used. The feature vector vf1 is generated.

Since the tuple string corresponding to the factor f2 is (f0, T), the character string feature vector generation unit 20 generates the character string feature vector vf2 using the character feature vector v (T) corresponding to the character T.

Since the tuple string corresponding to the factor f3 is (f2, T), the character string feature vector generation unit 20 reads the character belonging to the factor f2 and reads out the character feature vector v (T) corresponding to the character T. Then, the character string feature vector generation unit 20 combines the character feature vector v (T) corresponding to the character T included in the factor f3 and the read character feature vector v (T) with the combiner M to form the character string feature vector. Generate vf3.

Since the tuple string corresponding to the factor f4 is (f3, T), the character string feature vector generation unit 20 reads the character belonging to the factor f3 and reads out two character feature vectors v (T) corresponding to the character T. .. Then, the character string feature vector generation unit 20 combines the character feature vector v (T) corresponding to the character T included in the factor f4 and the two read character feature vectors v (T) with the combiner M to form a character string. The feature vector vf4 is generated.

Since the tuple string corresponding to the factor f5 is (f3, C), the character string feature vector generation unit 20 reads the character belonging to the factor f3 and reads out two character feature vectors v (T) corresponding to the character T. .. Then, the character string feature vector generation unit 20 combines the character feature vector v (C) corresponding to the character C included in the factor f5 and the two read character feature vectors v (T) with the combiner M to form a character string. The feature vector vf5 is generated.

Since the tuple string corresponding to the factor f6 is (f0, G), the character string feature vector generation unit 20 generates the character string feature vector vf6 using the character feature vector v (G) corresponding to the character G.

Since the tuple string corresponding to the factor f7 is (f0, A), the character string feature vector generation unit 20 generates the character string feature vector vf1 using the character feature vector v (A) corresponding to the character A.

The character string feature vector generation unit 20 generates a character string feature vector according to the above procedure. In neural networks, it is often easy for the input layer to accept vector inputs. Then, in the information processing apparatus 1 according to the first embodiment, the character string feature vector generated by the character string feature vector generation unit 20 is input to the prediction unit 21, and the prediction value is output from the prediction unit 21. That is, in the information processing apparatus 1 according to the first embodiment, the grammar-based compression character string D1 is input to the prediction processing unit 11, and the character string included in the grammar-based compression character string D1 is interpreted based on the grammar-based compression rule to character string features. Generate a vector. As a result, the information processing apparatus 1 according to the first embodiment outputs the predicted value of the character string included in the grammar-based compressed character string D1 without decompressing the grammar-based compressed character string D1.

From the above description, the information processing apparatus 1 according to the first embodiment decompresses the grammatically compressed character string by selecting the character feature vector to be applied by the character string feature vector generation unit 20 based on the grammar compression rule. Interpret without. As a result, in the information processing apparatus 1 according to the first embodiment, the character string feature vector corresponding to the grammar-based compression character string D1 is performed without separately inputting the grammar-based compression rule at the time of compression processing and without performing decompression processing. Can be generated. Then, the information processing apparatus 1 according to the first embodiment outputs the predicted value of the character string included in the grammar-based compressed character string D1 by inputting the character string feature vector into the prediction unit 21. As a result, in the information processing apparatus 1 according to the first embodiment, the time required to calculate the predicted value of the character string included in the grammar-based compressed character string D1 can be shortened.

In the verification by the inventors, when the information processing device 1 according to the first embodiment and the prediction unit 21 that inputs the grammar-based compressed character string D1 after decompression are compared, the prediction accuracy (correct answer rate of prediction) is It was about the same. Further, in the verification by the inventors, the processing speed of the information processing apparatus 1 according to the first embodiment is predicted by the prediction unit 21 in which the decompression process of the grammar-based compressed character string D1 and the decompressed grammar-based compressed character string D1 are input. The processing could be completed at a processing speed of about 3 times that of the total time of the processing.

In a system that handles big data, the big data to be processed is often stored in storage in a compressed state. When performing prediction processing on such big data, for example, when the technique described in Patent Document 1 is used, data is read from the storage, the read data is decompressed, and it is predicted that further compression processing is performed after decompression. Must be processed. That is, when the prediction processing for the compressed data on the storage is performed by using the technique described in Patent Document 1, there arises a problem that it takes more time than the prediction processing for the uncompressed data. On the other hand, in the information processing apparatus 1 according to the first embodiment, the prediction process can be performed on the big data stored in the compressed state on the storage without performing the decompression process. The effect of speeding up the processing becomes very large.

Embodiment 2
In the second embodiment, the information processing device 2 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the second embodiment, the same components as those of the first embodiment are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.

FIG. 3 shows a block diagram of the information processing device 2 according to the second embodiment. As shown in FIG. 3, the information processing device 2 according to the second embodiment is obtained by adding the dependency extraction unit 13 to the information processing device 1 according to the first embodiment. The dependency extraction unit 13 extracts the dependency relationships between the factors constituting the grammar-based compressed character string D1 read by the input unit 10. Then, in the second embodiment, the character string feature vector generation unit 20 generates the character string feature vector based on the dependency relationships extracted by the dependency extraction unit 13.

Here, the dependency extracted by the dependency extraction unit 13 in the information processing apparatus 2 according to the second embodiment and the operation of the character string feature vector generation unit 20 using the dependency will be described with reference to specific examples. .. Therefore, FIG. 4 shows a diagram illustrating one specific example of processing in the information processing apparatus according to the second embodiment. In the example shown in FIG. 4, the dependency is extracted by using the dependency extraction unit 13 of the second embodiment for the example of the character string shown in FIG.

In the example shown in FIG. 4, the factor f3 includes the element of the factor f2, and the factors f4 and f5 include the element of the factor f3. That is, the factor f3 has a dependency relationship with the factor f2, and the factors f4 and f5 have a dependency relationship with the factor f3.

Then, based on the above-mentioned dependency relationship, in the second embodiment, the character string feature vector generation unit 20 combines the character string feature vector vf2 corresponding to the factor f2 and the character feature vector v (T) corresponding to the character T. The character string feature vector vf3 corresponding to the factor f3 is generated by combining with the device M.

The character string feature vector generation unit 20 combines the character string feature vector vf3 corresponding to the factor f3 and the character feature vector v (T) corresponding to the character T with the combiner M, and the character string feature vector corresponding to the factor f4. Generate vf4.

The character string feature vector generation unit 20 combines the character string feature vector vf3 corresponding to the factor f3 and the character feature vector v (C) corresponding to the character C with the combiner M, and the character string feature vector corresponding to the factor f5. Generate vf5.

In the information processing apparatus 2 according to the second embodiment, the combination of vector values to be combined by the combiner M can be simplified by extracting the dependency between the factors by the dependency extraction unit 13. As a result, the information processing device 2 according to the second embodiment can perform the prediction processing at a higher speed than the information processing device 1 according to the first embodiment.

Embodiment 3
In the third embodiment, the information processing device 3 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the third embodiment, the same components as those of the first and second embodiments are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.

FIG. 5 shows a block diagram of the information processing apparatus 3 according to the third embodiment. As shown in FIG. 5, the information processing apparatus 3 according to the third embodiment is obtained by adding a loss calculation unit 14 and a parameter learning unit 15 to the information processing apparatus 2 according to the second embodiment. The loss calculation unit 14 outputs a difference value between the teacher data D3 prepared in advance and the predicted value output from the prediction processing unit 11 when the grammar-based compressed character string D1 is input. The parameter learning unit 15 updates the predictor parameters so that the difference value output by the loss calculation unit 14 becomes small.

From the above description, in the information processing apparatus 3 according to the third embodiment, the correct answer rate of the predicted value when the grammar-based compressed character string D1 is input can be improved by updating the predictor parameter based on the teacher data D3. it can.

Embodiment 4
In the fourth embodiment, the information processing device 4 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the fourth embodiment, the same components as those of the first and second embodiments are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.

FIG. 6 shows a block diagram of the information processing apparatus 4 according to the fourth embodiment. As shown in FIG. 6, the information processing device 4 according to the fourth embodiment has a prediction processing unit 16 instead of the prediction processing unit 11 of the information processing device 1. Further, the prediction processing unit 16 includes a grouping unit 30, a character string feature vector generation unit 31, and a prediction unit 21.

The grouping unit 30 calculates the number of stages of the dependency for each factor based on the dependency between the factors of the grammar-based compressed character string D1 extracted by the dependency extraction unit 13, and groups the factors having the same number of stages. The character string feature vector generation unit 31 calculates the character string feature vector in parallel for the factors belonging to the same group based on the result of grouping by the grouping unit 30. Then, the prediction processing unit 16 inputs the character string feature vector generated by the character string feature vector generation unit 31 into the prediction unit 21 to obtain a prediction value.

Here, the grouping process of the grouping unit 30 will be described. Therefore, FIG. 7 shows a diagram illustrating one specific example of the grouping process in the information processing apparatus 4 according to the fourth embodiment. Note that FIG. 7 is an example of the same character string as the example of the character string shown in FIG.

In the example shown in FIG. 7, the factor f3 depends on the factor f2. Further, the factors f4 and f5 depend on the factors f3 and f2. On the other hand, the factors f1, f2, f6, and f7 can calculate the character string feature vector without depending on other factors. Therefore, the grouping unit 30 classifies the character string feature vectors vf1, vf2, vf6, and vf7, which do not depend on other factors, into group 1 having 0 dependent stages. Further, the grouping unit 30 classifies the character string feature vector vf3 corresponding to the factor f3 in which the number of dependent stages is one into group 2. Further, the grouping unit 30 classifies the character string feature vectors vf4 and vf5 corresponding to the factors f4 and f5 having two dependent stages into group 3.

Then, in the fourth embodiment, the character string feature vector generation unit 31 calculates the character string feature vector in parallel for each group.

From the above description, the information processing device 4 according to the fourth embodiment can perform the prediction processing faster than the information processing device 1 according to the first embodiment by calculating the character string feature vector by parallel processing. it can.

The present invention is not limited to the above embodiment, and can be appropriately modified without departing from the spirit.

In the above example, the program can be stored and supplied to a computer using various types of non-transitory computer readable medium. Non-temporary computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, It includes a CD-R / W and a semiconductor memory (for example, a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, and a RAM (RandomAccessMemory)). The program may also be supplied to the computer by various types of temporary computer readable medium. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

1 to 4 Information processing device 10 Input unit 11 Prediction processing unit 12 Output unit 13 Dependency extraction unit 14 Loss calculation unit 15 Parameter learning unit 16

Prediction processing unit

20, 31 Character string feature vector generation unit 21 Prediction unit 30 Grouping unit D1 Grammar Compressed string D2 Parameter database D3 Teacher data

Claims

A parameter database that stores at least the combiner parameters, which are the operation setting values of the combiner used to calculate the vector value corresponding to the character string included in the compressed data generated by grammar-based compression, and the predictor parameters.
The grammar-based compression rule is interpreted from the compressed data, the character string feature vector corresponding to the character string included in the compressed data is generated by using the grammar-based compression rule and the combiner parameter, and the predictor parameter is used. A prediction processing unit that outputs the predicted value of the character string from the character string feature vector, and
Information processing device with.
The prediction processing unit
A character string feature vector generation unit that generates the character string feature vector by combining the character feature vectors stored in the parameter database based on the combiner parameters.
A predictor that outputs the predicted value of the character string by inputting the character string feature vector, and
The information processing apparatus according to claim 1.
It also has a dependency extraction unit that extracts the dependencies between the factors that make up the character string.
The information processing apparatus according to claim 2, wherein the character string feature vector generation unit combines vector values of other factors having a dependency relationship with the character feature vector to generate the character string feature vector.
The prediction processing unit further has a grouping unit that calculates the number of stages of the dependency for each factor based on the dependency between the factors and groups the factors of the same number of stages.
The information processing apparatus according to claim 3, wherein the character string feature vector generation unit calculates the character string feature vector in parallel with respect to the grouped factors.
The predicted value output by the prediction processing unit is compared with the teacher data corresponding to the character string input when calculating the predicted value, and the difference value between the predicted value and the teacher data is output. Loss calculation unit and
The information processing apparatus according to any one of claims 1 to 4, further comprising a parameter learning unit that updates the predictor parameters so that the difference value becomes smaller.
The information processing device according to any one of claims 1 to 5, wherein the prediction processing unit outputs a predicted value of the character string using a neural network.
It is a character string prediction method after grammar-based compression that predicts the character string included in the compressed data generated by performing grammar-based compression processing on the character string using a computer.
Enter the compressed data generated by applying grammar-based compression to the character string,
The rule of the grammar-based code is interpreted from the compressed data, and the character feature vector is applied to the character string included in the compressed data to generate the character string feature vector.
A method of predicting a character string after grammar-based code that outputs a predicted value of the character string indicated by the character string feature vector using a predictor parameter.
A computer-readable medium containing a program that predicts the character string contained in the compressed data generated by performing grammar-based compression processing on the character string using a computer's arithmetic function.
The program
Read the compressed data generated by applying grammar compression to the character string,
The rule of the grammar-based code is interpreted from the compressed data, and the character feature vector is applied to the character string included in the compressed data to generate the character string feature vector.
A computer-readable medium containing a program that outputs a predicted value of the character string indicated by the character string feature vector using predictor parameters.