WO2021140639A1 - Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor - Google Patents

Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor Download PDF

Info

Publication number
WO2021140639A1
WO2021140639A1 PCT/JP2020/000629 JP2020000629W WO2021140639A1 WO 2021140639 A1 WO2021140639 A1 WO 2021140639A1 JP 2020000629 W JP2020000629 W JP 2020000629W WO 2021140639 A1 WO2021140639 A1 WO 2021140639A1
Authority
WO
WIPO (PCT)
Prior art keywords
character string
feature vector
grammar
information processing
character
Prior art date
Application number
PCT/JP2020/000629
Other languages
French (fr)
Japanese (ja)
Inventor
耀一 佐々木
康佑 秋元
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to PCT/JP2020/000629 priority Critical patent/WO2021140639A1/en
Publication of WO2021140639A1 publication Critical patent/WO2021140639A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code

Definitions

  • the present invention relates to an information processing device, a method for predicting a character string after grammatical compression, and a computer-readable medium in which the program is stored.
  • grammatically compressed character string data is input and included in the character string data. It relates to an information processing device for predicting a character string, a method for predicting a character string after grammatical compression, and a computer-readable medium in which the program is stored.
  • Patent Document 1 discloses an example in which compression processing is performed on a series to generate compressed data and learning processing using the compressed data is performed, and machine learning using image data compressed in Non-Patent Document 1 is disclosed. The task is disclosed.
  • the compression unit generates a context-free grammar in which the input symbol sequence is compressed
  • the model generation unit generates a syntax tree representing the context-free grammar and model parameters.
  • the initial value is set to, and the learning unit corresponds to the non-terminal symbol for each non-terminal symbol based on the model parameter and the syntax tree, and the topic is assigned to the latent variable.
  • the inner probability which is the probability that a terminal symbol is obtained
  • the number of times a topic is assigned to the latent variable of the inner point corresponding to the non-terminal symbol is expected for each topic.
  • the outer expected value which is a value, is calculated, and the model parameters are updated based on the calculated inner probability and the outer expected value, and this process is repeated until a predetermined termination condition is satisfied.
  • Non-Patent Document 1 for JPEG data in which an image is irreversibly compressed, a neural network is used based on the Y, Cb, and Cr representations in the process of returning the uncompressed image to RGB without decompressing all the data.
  • the machine learning task to be input is disclosed.
  • One aspect of the information processing apparatus is a coupler parameter, which is an operation setting value of a coupler used for calculating a vector value corresponding to a character string included in a compressed data generated by performing grammatical compression, and a prediction.
  • a parameter database that stores at least device parameters and a character string feature vector that interprets the grammatical compression rules from the compressed data and uses the grammatical compression rules and the combiner parameters to correspond to the character strings contained in the compressed data. Is generated, and has a prediction processing unit that outputs the predicted value of the character string from the character string feature vector using the predictor parameter.
  • One aspect of the method for predicting a character string after grammatical compression is to predict the character string included in the compressed data generated by performing grammatical compression processing on the character string using a computer.
  • the compressed data generated by applying grammatical compression to the character string is input, the rule of the grammatical compression is interpreted from the compressed data, and the characters included in the compressed data.
  • a character string feature vector is applied to a column to generate a character string feature vector, and the predicted value of the character string indicated by the character string feature vector is output using the predictor parameters.
  • One aspect of the computer-readable medium in which the program according to the present invention is stored is a program that predicts the character string included in the compressed data generated by performing grammatical compression processing on the character string by using a calculation function of the computer.
  • the program reads compressed data generated by performing grammatical compression on a character string, interprets the grammatical compression rules from the compressed data, and performs the compressed data.
  • a character string feature vector is applied to the character string included in the character string feature vector to generate a character string feature vector, and the predicted value of the character string indicated by the character string feature vector is output using the predictor parameter.
  • the method of predicting the character string after grammar-based compression, and the computer-readable medium in which the program is stored, the grammar-based character string data is not decompressed by the predictor. Can be given.
  • FIG. It is a block diagram of the information processing apparatus which concerns on Embodiment 1.
  • FIG. It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 1.
  • FIG. It is a block diagram of the information processing apparatus which concerns on Embodiment 2.
  • FIG. It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 2.
  • FIG. It is a block diagram of the information processing apparatus which concerns on Embodiment 3.
  • FIG. It is a block diagram of the information processing apparatus which concerns on Embodiment 4.
  • FIG. It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 4.
  • FIG. It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 4.
  • FIG. 1 shows a block diagram of the information processing apparatus according to the first embodiment.
  • the information processing apparatus 1 according to the first embodiment includes an input unit 10, a prediction processing unit 11, an output unit 12, and a parameter database D2.
  • compressed data for example, grammar-based compressed character string D1 is input from another database or from the outside.
  • the information processing device 1 is, for example, an arithmetic unit such as a computer, and the functions of the input unit 10, the prediction processing unit 11, and the output unit 12 are realized by executing a program.
  • the information processing device 1 may be connected to another computer or a database via a network, and may operate independently without using a network.
  • the input unit 10 reads the grammar-based compressed character string D1 from a storage device (not shown).
  • a storage device a hard disk, SSD (Solid State Disk), non-volatile memory and the like can be considered.
  • the output unit 12 outputs the predicted value output by the prediction processing unit 11 to another storage device.
  • the parameter database D2 stores the character feature vector, the combiner parameter, and the predictor parameter used in the prediction processing unit 11.
  • the prediction processing unit 11 predicts the character string included in the grammar-based compressed character string D1 read by the input unit 10. At this time, the prediction processing unit 11 has one of the features of predicting the character string included in the grammar-based compressed character string D1 without decompressing the grammar-based compressed character string D1. More specifically, the prediction processing unit 11 interprets the rules of grammar-based compression from the grammar-based compression character string D1 using the combiner parameter, and generates a character string feature vector corresponding to the character string included in the compressed data. Then, the predicted value of the character string is output from the character string feature vector using the predictor parameter.
  • the prediction processing unit 11 has a character string feature vector generation unit 20 and a prediction unit 21.
  • the character string feature vector generation unit 20 applies the character feature vector stored in the parameter database D2 to the grammar-based character string D1 to generate the character string feature vector.
  • the character string feature vector generation unit 20 selects a character feature vector to be applied to the characters constituting the character string included in the grammar-compressed character string D1 based on the grammar-based compression rule of the grammar-compressed character string D1.
  • the prediction unit 21 is, for example, an arithmetic unit using a neural network technique such as RSTM (Long Short Term Memory).
  • RSTM Long Short Term Memory
  • FIG. 2 shows a diagram illustrating one specific example of processing in the information processing apparatus according to the first embodiment.
  • the character string "ATTTTTTTTTCGA" is shown as the character string to be compressed.
  • This string is, for example, something that is used as part of a base sequence.
  • the character string may be not only a base sequence but also a general language such as a word or a sentence.
  • the grammar-based compression character string D1 is a character string that has undergone grammar-based compression processing.
  • methods such as LZ78, LXW, and LZD are known.
  • the target character string is grammatically compressed by LZ78 with respect to the character string.
  • the factor is set for each continuous character string in which one character is added to the portion where the characters are continuous in the order already mentioned in the combination of consecutive characters.
  • the factor f0 and the target are used.
  • the factors f1, f2, f6, and f7 are replaced with a character string that is a combination of the factor f0 and the characters included in the target factor.
  • the factor f3 is a combination of the factor f2 and the character T, it is replaced with the character strings of the factors f2 and T after compression.
  • the factor f4 is a combination of the factor f3 and the character T, it is replaced with the character string of the factors f3 and T after compression.
  • the factor f5 is a combination of the factor f3 and the character C, it is replaced with the character strings of the factors f3 and C after compression.
  • this compressed character string is called a tuple string.
  • this tuple string is input to the prediction processing unit 11 as a grammar-based compressed character string D1.
  • the character string feature vector generation unit 20 applies the character feature vector stored in the parameter database D2 based on the grammar compression rule of the grammar-based code string D1 to characterize the character. Generate a column feature vector.
  • the character string feature vector vf1 to vf7 which is a character string component of the character string feature vector corresponding to the factors f1 to f7, is combined by using the combiner M. To generate. Then, the information processing device 1 combines the character string feature vectors vf1 to vf7 and inputs them to the prediction unit 21.
  • the character string feature vector generation unit 20 since the tuple string corresponding to the factor f1 is (f0, A) in the character string feature vector generation unit 20, the character string using the character feature vector v (A) corresponding to the character A is used.
  • the feature vector vf1 is generated.
  • the character string feature vector generation unit 20 Since the tuple string corresponding to the factor f2 is (f0, T), the character string feature vector generation unit 20 generates the character string feature vector vf2 using the character feature vector v (T) corresponding to the character T.
  • the character string feature vector generation unit 20 Since the tuple string corresponding to the factor f3 is (f2, T), the character string feature vector generation unit 20 reads the character belonging to the factor f2 and reads out the character feature vector v (T) corresponding to the character T. Then, the character string feature vector generation unit 20 combines the character feature vector v (T) corresponding to the character T included in the factor f3 and the read character feature vector v (T) with the combiner M to form the character string feature vector. Generate vf3.
  • the character string feature vector generation unit 20 Since the tuple string corresponding to the factor f4 is (f3, T), the character string feature vector generation unit 20 reads the character belonging to the factor f3 and reads out two character feature vectors v (T) corresponding to the character T. .. Then, the character string feature vector generation unit 20 combines the character feature vector v (T) corresponding to the character T included in the factor f4 and the two read character feature vectors v (T) with the combiner M to form a character string. The feature vector vf4 is generated.
  • the character string feature vector generation unit 20 Since the tuple string corresponding to the factor f5 is (f3, C), the character string feature vector generation unit 20 reads the character belonging to the factor f3 and reads out two character feature vectors v (T) corresponding to the character T. .. Then, the character string feature vector generation unit 20 combines the character feature vector v (C) corresponding to the character C included in the factor f5 and the two read character feature vectors v (T) with the combiner M to form a character string. The feature vector vf5 is generated.
  • the character string feature vector generation unit 20 Since the tuple string corresponding to the factor f6 is (f0, G), the character string feature vector generation unit 20 generates the character string feature vector vf6 using the character feature vector v (G) corresponding to the character G.
  • the character string feature vector generation unit 20 Since the tuple string corresponding to the factor f7 is (f0, A), the character string feature vector generation unit 20 generates the character string feature vector vf1 using the character feature vector v (A) corresponding to the character A.
  • the character string feature vector generation unit 20 generates a character string feature vector according to the above procedure. In neural networks, it is often easy for the input layer to accept vector inputs. Then, in the information processing apparatus 1 according to the first embodiment, the character string feature vector generated by the character string feature vector generation unit 20 is input to the prediction unit 21, and the prediction value is output from the prediction unit 21. That is, in the information processing apparatus 1 according to the first embodiment, the grammar-based compression character string D1 is input to the prediction processing unit 11, and the character string included in the grammar-based compression character string D1 is interpreted based on the grammar-based compression rule to character string features. Generate a vector. As a result, the information processing apparatus 1 according to the first embodiment outputs the predicted value of the character string included in the grammar-based compressed character string D1 without decompressing the grammar-based compressed character string D1.
  • the information processing apparatus 1 decompresses the grammatically compressed character string by selecting the character feature vector to be applied by the character string feature vector generation unit 20 based on the grammar compression rule.
  • the character string feature vector corresponding to the grammar-based compression character string D1 is performed without separately inputting the grammar-based compression rule at the time of compression processing and without performing decompression processing. Can be generated.
  • the information processing apparatus 1 according to the first embodiment outputs the predicted value of the character string included in the grammar-based compressed character string D1 by inputting the character string feature vector into the prediction unit 21.
  • the time required to calculate the predicted value of the character string included in the grammar-based compressed character string D1 can be shortened.
  • the processing speed of the information processing apparatus 1 according to the first embodiment is predicted by the prediction unit 21 in which the decompression process of the grammar-based compressed character string D1 and the decompressed grammar-based compressed character string D1 are input. The processing could be completed at a processing speed of about 3 times that of the total time of the processing.
  • the big data to be processed is often stored in storage in a compressed state.
  • prediction processing for example, when the technique described in Patent Document 1 is used, data is read from the storage, the read data is decompressed, and it is predicted that further compression processing is performed after decompression. Must be processed. That is, when the prediction processing for the compressed data on the storage is performed by using the technique described in Patent Document 1, there arises a problem that it takes more time than the prediction processing for the uncompressed data.
  • the prediction process can be performed on the big data stored in the compressed state on the storage without performing the decompression process. The effect of speeding up the processing becomes very large.
  • Embodiment 2 In the second embodiment, the information processing device 2 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the second embodiment, the same components as those of the first embodiment are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.
  • FIG. 3 shows a block diagram of the information processing device 2 according to the second embodiment.
  • the information processing device 2 according to the second embodiment is obtained by adding the dependency extraction unit 13 to the information processing device 1 according to the first embodiment.
  • the dependency extraction unit 13 extracts the dependency relationships between the factors constituting the grammar-based compressed character string D1 read by the input unit 10.
  • the character string feature vector generation unit 20 generates the character string feature vector based on the dependency relationships extracted by the dependency extraction unit 13.
  • FIG. 4 shows a diagram illustrating one specific example of processing in the information processing apparatus according to the second embodiment.
  • the dependency is extracted by using the dependency extraction unit 13 of the second embodiment for the example of the character string shown in FIG.
  • the factor f3 includes the element of the factor f2, and the factors f4 and f5 include the element of the factor f3. That is, the factor f3 has a dependency relationship with the factor f2, and the factors f4 and f5 have a dependency relationship with the factor f3.
  • the character string feature vector generation unit 20 combines the character string feature vector vf2 corresponding to the factor f2 and the character feature vector v (T) corresponding to the character T.
  • the character string feature vector vf3 corresponding to the factor f3 is generated by combining with the device M.
  • the character string feature vector generation unit 20 combines the character string feature vector vf3 corresponding to the factor f3 and the character feature vector v (T) corresponding to the character T with the combiner M, and the character string feature vector corresponding to the factor f4. Generate vf4.
  • the character string feature vector generation unit 20 combines the character string feature vector vf3 corresponding to the factor f3 and the character feature vector v (C) corresponding to the character C with the combiner M, and the character string feature vector corresponding to the factor f5. Generate vf5.
  • the combination of vector values to be combined by the combiner M can be simplified by extracting the dependency between the factors by the dependency extraction unit 13.
  • the information processing device 2 according to the second embodiment can perform the prediction processing at a higher speed than the information processing device 1 according to the first embodiment.
  • Embodiment 3 In the third embodiment, the information processing device 3 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the third embodiment, the same components as those of the first and second embodiments are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.
  • FIG. 5 shows a block diagram of the information processing apparatus 3 according to the third embodiment.
  • the information processing apparatus 3 according to the third embodiment is obtained by adding a loss calculation unit 14 and a parameter learning unit 15 to the information processing apparatus 2 according to the second embodiment.
  • the loss calculation unit 14 outputs a difference value between the teacher data D3 prepared in advance and the predicted value output from the prediction processing unit 11 when the grammar-based compressed character string D1 is input.
  • the parameter learning unit 15 updates the predictor parameters so that the difference value output by the loss calculation unit 14 becomes small.
  • the correct answer rate of the predicted value when the grammar-based compressed character string D1 is input can be improved by updating the predictor parameter based on the teacher data D3. it can.
  • Embodiment 4 In the fourth embodiment, the information processing device 4 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the fourth embodiment, the same components as those of the first and second embodiments are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.
  • FIG. 6 shows a block diagram of the information processing apparatus 4 according to the fourth embodiment.
  • the information processing device 4 according to the fourth embodiment has a prediction processing unit 16 instead of the prediction processing unit 11 of the information processing device 1.
  • the prediction processing unit 16 includes a grouping unit 30, a character string feature vector generation unit 31, and a prediction unit 21.
  • the grouping unit 30 calculates the number of stages of the dependency for each factor based on the dependency between the factors of the grammar-based compressed character string D1 extracted by the dependency extraction unit 13, and groups the factors having the same number of stages.
  • the character string feature vector generation unit 31 calculates the character string feature vector in parallel for the factors belonging to the same group based on the result of grouping by the grouping unit 30. Then, the prediction processing unit 16 inputs the character string feature vector generated by the character string feature vector generation unit 31 into the prediction unit 21 to obtain a prediction value.
  • FIG. 7 shows a diagram illustrating one specific example of the grouping process in the information processing apparatus 4 according to the fourth embodiment. Note that FIG. 7 is an example of the same character string as the example of the character string shown in FIG.
  • the factor f3 depends on the factor f2. Further, the factors f4 and f5 depend on the factors f3 and f2. On the other hand, the factors f1, f2, f6, and f7 can calculate the character string feature vector without depending on other factors. Therefore, the grouping unit 30 classifies the character string feature vectors vf1, vf2, vf6, and vf7, which do not depend on other factors, into group 1 having 0 dependent stages. Further, the grouping unit 30 classifies the character string feature vector vf3 corresponding to the factor f3 in which the number of dependent stages is one into group 2. Further, the grouping unit 30 classifies the character string feature vectors vf4 and vf5 corresponding to the factors f4 and f5 having two dependent stages into group 3.
  • the character string feature vector generation unit 31 calculates the character string feature vector in parallel for each group.
  • the information processing device 4 according to the fourth embodiment can perform the prediction processing faster than the information processing device 1 according to the first embodiment by calculating the character string feature vector by parallel processing. it can.
  • Non-temporary computer-readable media include various types of tangible storage media.
  • Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, It includes a CD-R / W and a semiconductor memory (for example, a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, and a RAM (RandomAccessMemory)).
  • a semiconductor memory for example, a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, and a RAM (RandomAccessMemory)
  • the program may also be supplied to the computer by various types of temporary computer readable medium.
  • temporary computer-readable media include electrical, optical, and electromagnetic waves.
  • the temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
  • Information processing device 10 Input unit 11 Prediction processing unit 12 Output unit 13 Dependency extraction unit 14 Loss calculation unit 15 Parameter learning unit 16 Prediction processing unit 20, 31 Character string feature vector generation unit 21 Prediction unit 30 Grouping unit D1 Grammar Compressed string D2 Parameter database D3 Teacher data

Abstract

Conventional devices have had a problem in that it is difficult to perform prediction processing without decompressing compressed data. This information processing device includes: a parameter database (D2) which stores at least a coupler parameter, which is an action setting value for a coupler used in calculating a vector value corresponding to a character string included in compressed data (D1) generated by the implementation of grammatical compression, and a predictor parameter; and a prediction processing unit (11) that interprets a grammatical compression rule from the compressed data (D1), uses the grammatical compression rule and the coupler parameter to generate a character string feature vector corresponding to the character string included in the compressed data (D1), and uses the predictor parameter to output a predicted value for the character string from the character string feature vector.

Description

情報処理装置、文法圧縮後の文字列の予測方法、及び、そのプログラムが格納されたコンピュータ可読媒体Information processing device, method of predicting character strings after grammar-based code, and computer-readable medium in which the program is stored.
 本発明は情報処理装置、文法圧縮後の文字列の予測方法、及び、そのプログラムが格納されたコンピュータ可読媒体に関し、特に、文法圧縮された文字列データを入力とし、当該文字列データに含まれている文字列を予測する情報処理装置、文法圧縮後の文字列の予測方法、及び、そのプログラムが格納されたコンピュータ可読媒体に関する。 The present invention relates to an information processing device, a method for predicting a character string after grammatical compression, and a computer-readable medium in which the program is stored. In particular, grammatically compressed character string data is input and included in the character string data. It relates to an information processing device for predicting a character string, a method for predicting a character string after grammatical compression, and a computer-readable medium in which the program is stored.
 近年、ビックデータでは、その膨大なデータ量をできるだけ少ない保存容量に抑制するために、圧縮して保存することが行われている。このような圧縮データに対して、データ内に存在する内容を認識するためにはデータを解凍した後に予測処理をする必要がある。しかしながら、このような圧縮データに対する予測処理を行う場合は、データ容量に比例して、解凍するための前処理の時間が増大する問題が生じる。そこで、特許文献1に≒系列に圧縮処理を施して圧縮データを生成するとともに当該圧縮データを用いた学習処理を行う例が開示され、非特許文献1に圧縮された画像データを用いた機械学習タスクが開示されている。 In recent years, big data has been compressed and stored in order to suppress the huge amount of data to the smallest possible storage capacity. For such compressed data, it is necessary to perform prediction processing after decompressing the data in order to recognize the contents existing in the data. However, when the prediction processing for such compressed data is performed, there arises a problem that the preprocessing time for decompression increases in proportion to the data capacity. Therefore, Patent Document 1 discloses an example in which compression processing is performed on a series to generate compressed data and learning processing using the compressed data is performed, and machine learning using image data compressed in Non-Patent Document 1 is disclosed. The task is disclosed.
 より具体的には、特許文献1では、圧縮部により、入力された記号系列を圧縮した、文脈自由文法を生成し、モデル生成部により、文脈自由文法を表す構文木を生成すると共に、モデルパラメータに初期値を設定し、学習部により、モデルパラメータと構文木とに基づいて、非終端記号の各々に対し、トピック毎に、非終端記号に対応し、かつ、潜在変数にトピックが割り当てられた内点から、構文木を辿って終端記号が得られる確率である内側確率を計算し、非終端記号の各々に対し、トピック毎に、非終端記号に対応する内点の潜在変数にトピックが割り当てられる回数の期待値である外側期待値を計算し、計算した内側確率及び外側期待値に基づいて、モデルパラメータを更新することを、予め定められた終了条件を満たすまで繰り返す。 More specifically, in Patent Document 1, the compression unit generates a context-free grammar in which the input symbol sequence is compressed, and the model generation unit generates a syntax tree representing the context-free grammar and model parameters. The initial value is set to, and the learning unit corresponds to the non-terminal symbol for each non-terminal symbol based on the model parameter and the syntax tree, and the topic is assigned to the latent variable. From, the inner probability, which is the probability that a terminal symbol is obtained, is calculated by tracing the syntax tree, and for each non-terminal symbol, the number of times a topic is assigned to the latent variable of the inner point corresponding to the non-terminal symbol is expected for each topic. The outer expected value, which is a value, is calculated, and the model parameters are updated based on the calculated inner probability and the outer expected value, and this process is repeated until a predetermined termination condition is satisfied.
 また、非特許文献1では、画像を不可逆圧縮したJPEGデータに対して、データを全て解凍することなく、非圧縮状態の画像のRGBに戻す途中のY、Cb、Cr表現を元にニューラルネットワークの入力とする機械学習タスクが開示されている。 Further, in Non-Patent Document 1, for JPEG data in which an image is irreversibly compressed, a neural network is used based on the Y, Cb, and Cr representations in the process of returning the uncompressed image to RGB without decompressing all the data. The machine learning task to be input is disclosed.
特開2016-212742号公報Japanese Unexamined Patent Publication No. 2016-212742
 しかしながら、文字列データを文法圧縮した場合、JPEGデータとは圧縮方式が異なるため、非特許文献1に記載の提案手法では、ニューラルネットワークの入力に文法圧縮された文字列データを入力しても、問題を解くことができない。また、特許文献1では、記号系列を処理の対象とするものの、圧縮部が生成する文脈自由文法を利用しなければならず、一般的な圧縮方式で圧縮された圧縮データでは処理を行えない問題がある。 However, when the character string data is grammatically compressed, the compression method is different from that of the PEG data. Therefore, in the proposed method described in Non-Patent Document 1, even if the grammatically compressed character string data is input to the input of the neural network, I can't solve the problem. Further, in Patent Document 1, although the symbol sequence is the target of processing, the context-free grammar generated by the compression unit must be used, and there is a problem that processing cannot be performed with compressed data compressed by a general compression method. There is.
 本発明にかかる情報処理装置の一態様は、文法圧縮を施して生成される圧縮データに含まれる文字列に対応するベクトル値の算出に用いる結合器の動作設定値である結合器パラメータと、予測器パラメータとを少なくとも格納するパラメータデータベースと、前記圧縮データから文法圧縮のルールを解釈し、前記文法圧縮ルールと前記結合器パラメータを用いて前記圧縮データに含まれる文字列に対応する文字列特徴ベクトルを生成して、前記予測器パラメータを用いて当該文字列特徴ベクトルから前記文字列の予測値を出力する予測処理部と、を有する。 One aspect of the information processing apparatus according to the present invention is a coupler parameter, which is an operation setting value of a coupler used for calculating a vector value corresponding to a character string included in a compressed data generated by performing grammatical compression, and a prediction. A parameter database that stores at least device parameters and a character string feature vector that interprets the grammatical compression rules from the compressed data and uses the grammatical compression rules and the combiner parameters to correspond to the character strings contained in the compressed data. Is generated, and has a prediction processing unit that outputs the predicted value of the character string from the character string feature vector using the predictor parameter.
 本発明にかかる文法圧縮後の文字列の予測方法の一態様は、コンピュータを用いて文字列に対して文法圧縮処理を施して生成される圧縮データに含まれる前記文字列を予測する文法圧縮後の文字列予測方法であって、文字列に対して文法圧縮を施して生成される圧縮データを入力し、前記圧縮データから前記文法圧縮のルールを解釈して、前記圧縮データに含まれる前記文字列に文字特徴ベクトルを適用して文字列特徴ベクトルを生成し、前記予測器パラメータを用いて、前記文字列特徴ベクトルが示す前記文字列の予測値を出力する。 One aspect of the method for predicting a character string after grammatical compression according to the present invention is to predict the character string included in the compressed data generated by performing grammatical compression processing on the character string using a computer. In the character string prediction method of, the compressed data generated by applying grammatical compression to the character string is input, the rule of the grammatical compression is interpreted from the compressed data, and the characters included in the compressed data. A character string feature vector is applied to a column to generate a character string feature vector, and the predicted value of the character string indicated by the character string feature vector is output using the predictor parameters.
 本発明にかかるプログラムが格納されたコンピュータ可読媒体の一態様は、文字列に対して文法圧縮処理を施して生成される圧縮データに含まれる前記文字列をコンピュータの演算機能を用いて予測するプログラムが格納されたコンピュータ可読媒体であって、前記プログラムは、文字列に対して文法圧縮を施して生成される圧縮データを読み込み、前記圧縮データから前記文法圧縮のルールを解釈して、前記圧縮データに含まれる前記文字列に文字特徴ベクトルを適用して文字列特徴ベクトルを生成し、前記予測器パラメータを用いて、前記文字列特徴ベクトルが示す前記文字列の予測値を出力する。 One aspect of the computer-readable medium in which the program according to the present invention is stored is a program that predicts the character string included in the compressed data generated by performing grammatical compression processing on the character string by using a calculation function of the computer. Is a computer-readable medium in which is stored, and the program reads compressed data generated by performing grammatical compression on a character string, interprets the grammatical compression rules from the compressed data, and performs the compressed data. A character string feature vector is applied to the character string included in the character string feature vector to generate a character string feature vector, and the predicted value of the character string indicated by the character string feature vector is output using the predictor parameter.
 本発明にかかる情報処理装置、文法圧縮後の文字列の予測方法、及び、そのプログラムが格納されたコンピュータ可読媒体によれば、文法圧縮された文字列データを予測器に解凍処理をすることなく与えることができる。 According to the information processing apparatus according to the present invention, the method of predicting the character string after grammar-based compression, and the computer-readable medium in which the program is stored, the grammar-based character string data is not decompressed by the predictor. Can be given.
実施の形態1にかかる情報処理装置のブロック図である。It is a block diagram of the information processing apparatus which concerns on Embodiment 1. FIG. 実施の形態1にかかる情報処理装置における処理の具体例の1つを説明する図である。It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 1. FIG. 実施の形態2にかかる情報処理装置のブロック図である。It is a block diagram of the information processing apparatus which concerns on Embodiment 2. FIG. 実施の形態2にかかる情報処理装置における処理の具体例の1つを説明する図である。It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 2. FIG. 実施の形態3にかかる情報処理装置のブロック図である。It is a block diagram of the information processing apparatus which concerns on Embodiment 3. FIG. 実施の形態4にかかる情報処理装置のブロック図である。It is a block diagram of the information processing apparatus which concerns on Embodiment 4. FIG. 実施の形態4にかかる情報処理装置における処理の具体例の1つを説明する図である。It is a figure explaining one of the specific examples of the processing in the information processing apparatus which concerns on Embodiment 4. FIG.
 実施の形態1
 以下、図面を参照して本発明の実施の形態について説明する。図1に実施の形態1にかかる情報処理装置のブロック図を示す。図1に示すように、実施の形態1にかかる情報処理装置1は、入力部10、予測処理部11、出力部12、パラメータデータベースD2を有する。また、情報処理装置1では、圧縮データ(例えば、文法圧縮文字列D1)が他のデータベース或いは外部から入力する。
Embodiment 1
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 shows a block diagram of the information processing apparatus according to the first embodiment. As shown in FIG. 1, the information processing apparatus 1 according to the first embodiment includes an input unit 10, a prediction processing unit 11, an output unit 12, and a parameter database D2. Further, in the information processing apparatus 1, compressed data (for example, grammar-based compressed character string D1) is input from another database or from the outside.
 この情報処理装置1は、例えば、コンピュータ等の演算装置であり、プログラムを実行することで入力部10、予測処理部11、出力部12の機能が実現される。情報処理装置1は、ネットワークを介して他のコンピュータ、或いは、データベースと接続されていても良く、ネットワークは用いずに独立して動作するものであっても構わない。 The information processing device 1 is, for example, an arithmetic unit such as a computer, and the functions of the input unit 10, the prediction processing unit 11, and the output unit 12 are realized by executing a program. The information processing device 1 may be connected to another computer or a database via a network, and may operate independently without using a network.
 入力部10は、文法圧縮文字列D1を図示しない記憶装置から読み出す。この記憶装置としては、ハードディスク、SSD(Solid State Disk)、不揮発性メモリ等が考えられる。出力部12は、予測処理部11が出力した予測値を他の記憶装置に出力する。パラメータデータベースD2には、予測処理部11で用いる文字特徴ベクトル、結合器パラメータ、予測器パラメータを格納する。 The input unit 10 reads the grammar-based compressed character string D1 from a storage device (not shown). As this storage device, a hard disk, SSD (Solid State Disk), non-volatile memory and the like can be considered. The output unit 12 outputs the predicted value output by the prediction processing unit 11 to another storage device. The parameter database D2 stores the character feature vector, the combiner parameter, and the predictor parameter used in the prediction processing unit 11.
 予測処理部11は、入力部10が読み出した文法圧縮文字列D1に含まれる文字列を予測する。このとき予測処理部11は、文法圧縮文字列D1を解凍することなく文法圧縮文字列D1に含まれている文字列を予測することに特徴の1つを有する。より具体的には、予測処理部11は、結合器パラメータを用いて、文法圧縮文字列D1から文法圧縮のルールを解釈して圧縮データに含まれる文字列に対応する文字列特徴ベクトルを生成して、予測器パラメータを用いて当該文字列特徴ベクトルから文字列の予測値を出力する。 The prediction processing unit 11 predicts the character string included in the grammar-based compressed character string D1 read by the input unit 10. At this time, the prediction processing unit 11 has one of the features of predicting the character string included in the grammar-based compressed character string D1 without decompressing the grammar-based compressed character string D1. More specifically, the prediction processing unit 11 interprets the rules of grammar-based compression from the grammar-based compression character string D1 using the combiner parameter, and generates a character string feature vector corresponding to the character string included in the compressed data. Then, the predicted value of the character string is output from the character string feature vector using the predictor parameter.
 予測処理部11は、文字列特徴ベクトル生成部20、予測部21を有する。文字列特徴ベクトル生成部20は、文法圧縮文字列D1に対してパラメータデータベースD2に格納されている文字特徴ベクトルを適用して、文字列特徴ベクトルを生成する。このとき、文字列特徴ベクトル生成部20では、文法圧縮文字列D1の文法圧縮のルールに基づき文法圧縮文字列D1に含まれる文字列を構成する文字に適用する文字特徴ベクトルを選択する。予測部21は、例えば、LSTM(Long Short Term Memory)等のニューラルネットワーク技術を用いた演算器である。なお、予測部21としては、LSTM等の再帰型ニューラルネットワークだけで無く、順伝播型ニューラルネットワーク、たたみ込みニューラルネットワーク等様々なタイプのニューラルネットワークを適用可能である。 The prediction processing unit 11 has a character string feature vector generation unit 20 and a prediction unit 21. The character string feature vector generation unit 20 applies the character feature vector stored in the parameter database D2 to the grammar-based character string D1 to generate the character string feature vector. At this time, the character string feature vector generation unit 20 selects a character feature vector to be applied to the characters constituting the character string included in the grammar-compressed character string D1 based on the grammar-based compression rule of the grammar-compressed character string D1. The prediction unit 21 is, for example, an arithmetic unit using a neural network technique such as RSTM (Long Short Term Memory). As the prediction unit 21, not only recurrent neural networks such as LSTM but also various types of neural networks such as feedforward neural networks and convolutional neural networks can be applied.
 ここで、文法圧縮文字列D1及び文法圧縮文字列D1に対する予測処理について説明する。そこで、図2に実施の形態1にかかる情報処理装置における処理の具体例の1つを説明する図を示す。 Here, the prediction processing for the grammar-based compressed character string D1 and the grammar-based compressed character string D1 will be described. Therefore, FIG. 2 shows a diagram illustrating one specific example of processing in the information processing apparatus according to the first embodiment.
 図2に示す例では、圧縮対象の文字列として「ATTTTTTTTCGA」という文字列を示した。この文字列は、例えば、塩基配列の一部として用いられるようなものである。しかしながら、文字列としては、塩基配列のみならず、単語、文章等の一般的な言語であてもよい。 In the example shown in FIG. 2, the character string "ATTTTTTTTTCGA" is shown as the character string to be compressed. This string is, for example, something that is used as part of a base sequence. However, the character string may be not only a base sequence but also a general language such as a word or a sentence.
 また、文法圧縮文字列D1は、文字列に対して文法圧縮処理が施されたものである。この文法圧縮としては、LZ78、LXW、LZD等の方法が知られている。図2に示す例は、文字列に対してLZ78で対象文字列を文法圧縮したものでる。図2に示すように、LZ78では、連続する文字の組み合わせのうち、既出の順序で文字が連続する部分に1つ文字を加えた連続文字列毎にファクタを設定する。 Further, the grammar-based compression character string D1 is a character string that has undergone grammar-based compression processing. As this grammar compression, methods such as LZ78, LXW, and LZD are known. In the example shown in FIG. 2, the target character string is grammatically compressed by LZ78 with respect to the character string. As shown in FIG. 2, in the LZ78, the factor is set for each continuous character string in which one character is added to the portion where the characters are continuous in the order already mentioned in the combination of consecutive characters.
 図2に示す例では、ファクタf1に「A」を割り当て、ファクタf2に「T」を割り当て、ファクタf3に「TT」を割り当て、ファクタf4に「TTT」を割り当て、ファクタf5に「TTC」を割り当て、ファクタf6に「G」を割り当て、ファクタf7に「A」を割り当てた。また、LZ78では、長さ0の空文字列として、ファクタf0を設定する。 In the example shown in FIG. 2, "A" is assigned to factor f1, "T" is assigned to factor f2, "TT" is assigned to factor f3, "TTT" is assigned to factor f4, and "TTC" is assigned to factor f5. Allocation, "G" was assigned to factor f6, and "A" was assigned to factor f7. Further, in the LZ78, the factor f0 is set as an empty string having a length of 0.
 そして、LZ78では、対象のファクタに属する文字列が、初めて出てきた文字、或いは、連続する文字列として対象ファクタに属する文字列が他のファクタに割り当てられてない場合は、ファクタf0と対象のファクタに属する文字との組み合わせを設定する。図2に示す例では、ファクタf1、f2、f6、f7が、ファクタf0と対象ファクタに含まれる文字とを組み合わせた文字列に置き換えられる。 Then, in LZ78, if the character string belonging to the target factor is the first character to appear, or if the character string belonging to the target factor is not assigned to another factor as a continuous character string, the factor f0 and the target are used. Set the combination with the characters belonging to the factor. In the example shown in FIG. 2, the factors f1, f2, f6, and f7 are replaced with a character string that is a combination of the factor f0 and the characters included in the target factor.
 一方、図2に示す例では、ファクタf3は、ファクタf2に文字Tを組み合わせたものであるため、圧縮後には、ファクタf2とTとの文字列に置換される。ファクタf4は、ファクタf3と文字Tとを組み合わせたものであるため、圧縮後には、ファクタf3とTとの文字列に置換される。ファクタf5は、ファクタf3と文字Cとを組み合わせたものであるため、圧縮後には、ファクタf3とCとの文字列に置換される。 On the other hand, in the example shown in FIG. 2, since the factor f3 is a combination of the factor f2 and the character T, it is replaced with the character strings of the factors f2 and T after compression. Since the factor f4 is a combination of the factor f3 and the character T, it is replaced with the character string of the factors f3 and T after compression. Since the factor f5 is a combination of the factor f3 and the character C, it is replaced with the character strings of the factors f3 and C after compression.
 このように、文法圧縮では、出現する連続した文字列の組み合わせにファクタを割り当て、2回以上出現するファクタについては、ファクタと文字とを組み合わせた文字列に置き換える。この圧縮後の文字列をタプル列と称す。実施の形態1にかかる情報処理装置1では、このタプル列が文法圧縮文字列D1として予測処理部11に入力される。 In this way, in grammar compression, factors are assigned to combinations of consecutive character strings that appear, and factors that appear more than once are replaced with character strings that combine factors and characters. This compressed character string is called a tuple string. In the information processing apparatus 1 according to the first embodiment, this tuple string is input to the prediction processing unit 11 as a grammar-based compressed character string D1.
 そして、実施の形態1にかかる情報処理装置1では、文字列特徴ベクトル生成部20において文法圧縮文字列D1の文法圧縮ルールに基づいてパラメータデータベースD2に格納されている文字特徴ベクトルを適用して文字列特徴ベクトルを生成する。文字列特徴ベクトル生成部20では、結合器Mを用いて、複数の文字特徴ベクトルを結合してファクタf1~f7に対応する文字列特徴ベクトルの文字列構成要素となる文字列特徴ベクトルvf1~vf7を生成する。そして、情報処理装置1は、文字列特徴ベクトルvf1~vf7を結合して予測部21に入力する。 Then, in the information processing apparatus 1 according to the first embodiment, the character string feature vector generation unit 20 applies the character feature vector stored in the parameter database D2 based on the grammar compression rule of the grammar-based code string D1 to characterize the character. Generate a column feature vector. In the character string feature vector generation unit 20, the character string feature vector vf1 to vf7, which is a character string component of the character string feature vector corresponding to the factors f1 to f7, is combined by using the combiner M. To generate. Then, the information processing device 1 combines the character string feature vectors vf1 to vf7 and inputs them to the prediction unit 21.
 図2に示す例では、文字列特徴ベクトル生成部20は、ファクタf1に対応するタプル列が(f0,A)であるため、文字Aに対応する文字特徴ベクトルv(A)を用いて文字列特徴ベクトルvf1を生成する。 In the example shown in FIG. 2, since the tuple string corresponding to the factor f1 is (f0, A) in the character string feature vector generation unit 20, the character string using the character feature vector v (A) corresponding to the character A is used. The feature vector vf1 is generated.
 文字列特徴ベクトル生成部20は、ファクタf2に対応するタプル列が(f0,T)であるため、文字Tに対応する文字特徴ベクトルv(T)を用いて文字列特徴ベクトルvf2を生成する。 Since the tuple string corresponding to the factor f2 is (f0, T), the character string feature vector generation unit 20 generates the character string feature vector vf2 using the character feature vector v (T) corresponding to the character T.
 文字列特徴ベクトル生成部20は、ファクタf3に対応するタプル列が(f2,T)であるため、ファクタf2に属する文字を読み解き、文字Tに対応する文字特徴ベクトルv(T)を読み出す。そして、文字列特徴ベクトル生成部20は、ファクタf3に含まれる文字Tに対応する文字特徴ベクトルv(T)と読み出した文字特徴ベクトルv(T)を結合器Mで結合して文字列特徴ベクトルvf3を生成する。 Since the tuple string corresponding to the factor f3 is (f2, T), the character string feature vector generation unit 20 reads the character belonging to the factor f2 and reads out the character feature vector v (T) corresponding to the character T. Then, the character string feature vector generation unit 20 combines the character feature vector v (T) corresponding to the character T included in the factor f3 and the read character feature vector v (T) with the combiner M to form the character string feature vector. Generate vf3.
 文字列特徴ベクトル生成部20は、ファクタf4に対応するタプル列が(f3,T)であるため、ファクタf3に属する文字を読み解き、文字Tに対応する文字特徴ベクトルv(T)を2つ読み出す。そして、文字列特徴ベクトル生成部20は、ファクタf4に含まれる文字Tに対応する文字特徴ベクトルv(T)と読み出した2つの文字特徴ベクトルv(T)を結合器Mで結合して文字列特徴ベクトルvf4を生成する。 Since the tuple string corresponding to the factor f4 is (f3, T), the character string feature vector generation unit 20 reads the character belonging to the factor f3 and reads out two character feature vectors v (T) corresponding to the character T. .. Then, the character string feature vector generation unit 20 combines the character feature vector v (T) corresponding to the character T included in the factor f4 and the two read character feature vectors v (T) with the combiner M to form a character string. The feature vector vf4 is generated.
 文字列特徴ベクトル生成部20は、ファクタf5に対応するタプル列が(f3,C)であるため、ファクタf3に属する文字を読み解き、文字Tに対応する文字特徴ベクトルv(T)を2つ読み出す。そして、文字列特徴ベクトル生成部20は、ファクタf5に含まれる文字Cに対応する文字特徴ベクトルv(C)と読み出した2つの文字特徴ベクトルv(T)を結合器Mで結合して文字列特徴ベクトルvf5を生成する。 Since the tuple string corresponding to the factor f5 is (f3, C), the character string feature vector generation unit 20 reads the character belonging to the factor f3 and reads out two character feature vectors v (T) corresponding to the character T. .. Then, the character string feature vector generation unit 20 combines the character feature vector v (C) corresponding to the character C included in the factor f5 and the two read character feature vectors v (T) with the combiner M to form a character string. The feature vector vf5 is generated.
 文字列特徴ベクトル生成部20は、ファクタf6に対応するタプル列が(f0,G)であるため、文字Gに対応する文字特徴ベクトルv(G)を用いて文字列特徴ベクトルvf6を生成する。 Since the tuple string corresponding to the factor f6 is (f0, G), the character string feature vector generation unit 20 generates the character string feature vector vf6 using the character feature vector v (G) corresponding to the character G.
 文字列特徴ベクトル生成部20は、ファクタf7に対応するタプル列が(f0,A)であるため、文字Aに対応する文字特徴ベクトルv(A)を用いて文字列特徴ベクトルvf1を生成する。 Since the tuple string corresponding to the factor f7 is (f0, A), the character string feature vector generation unit 20 generates the character string feature vector vf1 using the character feature vector v (A) corresponding to the character A.
 文字列特徴ベクトル生成部20は、上述の手順に従って文字列特徴ベクトルを生成する。ニューラルネットワークでは、入力層がベクトル入力を受け付けることが容易なことが多い。そして、実施の形態1にかかる情報処理装置1では、予測部21に文字列特徴ベクトル生成部20が生成した文字列特徴ベクトルを入力して、予測部21から予測値を出力させる。つまり、実施の形態1にかかる情報処理装置1では、予測処理部11に文法圧縮文字列D1を入力し、文法圧縮ルールに基づき文法圧縮文字列D1に含まれる文字列を解釈して文字列特徴ベクトルを生成する。これにより、実施の形態1にかかる情報処理装置1では、文法圧縮文字列D1を解凍することなく文法圧縮文字列D1に含まれている文字列の予測値を出力する。 The character string feature vector generation unit 20 generates a character string feature vector according to the above procedure. In neural networks, it is often easy for the input layer to accept vector inputs. Then, in the information processing apparatus 1 according to the first embodiment, the character string feature vector generated by the character string feature vector generation unit 20 is input to the prediction unit 21, and the prediction value is output from the prediction unit 21. That is, in the information processing apparatus 1 according to the first embodiment, the grammar-based compression character string D1 is input to the prediction processing unit 11, and the character string included in the grammar-based compression character string D1 is interpreted based on the grammar-based compression rule to character string features. Generate a vector. As a result, the information processing apparatus 1 according to the first embodiment outputs the predicted value of the character string included in the grammar-based compressed character string D1 without decompressing the grammar-based compressed character string D1.
 上記説明より、実施の形態1にかかる情報処理装置1は、文字列特徴ベクトル生成部20が文法圧縮ルールに基づき適用する文字特徴ベクトルを選択することで、文法圧縮された文字列を解凍することなく解釈する。これにより、実施の形態1にかかる情報処理装置1では、圧縮処理時の文法圧縮ルールを別途入力することなく、かつ、解凍処理も行うことなく、文法圧縮文字列D1に対応する文字列特徴ベクトルを生成することができる。そして、実施の形態1にかかる情報処理装置1では、予測部21に文字列特徴ベクトルを入力することで、文法圧縮文字列D1に含まれている文字列の予測値を出力する。これにより、実施の形態1にかかる情報処理装置1では、文法圧縮文字列D1に含まれる文字列の予測値を算出に要する時間を短縮することができる。 From the above description, the information processing apparatus 1 according to the first embodiment decompresses the grammatically compressed character string by selecting the character feature vector to be applied by the character string feature vector generation unit 20 based on the grammar compression rule. Interpret without. As a result, in the information processing apparatus 1 according to the first embodiment, the character string feature vector corresponding to the grammar-based compression character string D1 is performed without separately inputting the grammar-based compression rule at the time of compression processing and without performing decompression processing. Can be generated. Then, the information processing apparatus 1 according to the first embodiment outputs the predicted value of the character string included in the grammar-based compressed character string D1 by inputting the character string feature vector into the prediction unit 21. As a result, in the information processing apparatus 1 according to the first embodiment, the time required to calculate the predicted value of the character string included in the grammar-based compressed character string D1 can be shortened.
 発明者らの検証では、実施の形態1にかかる情報処理装置1と、解凍後の文法圧縮文字列D1を入力とする予測部21と、を比較した場合、予測精度(予測の正答率)は同程度であった。また、発明者らの検証では、実施の形態1にかかる情報処理装置1の処理速度は、文法圧縮文字列D1の解凍処理と解凍後の文法圧縮文字列D1を入力とする予測部21の予測処理との総時間に較べ3倍程度の処理速度で処理を完了させることができた。 In the verification by the inventors, when the information processing device 1 according to the first embodiment and the prediction unit 21 that inputs the grammar-based compressed character string D1 after decompression are compared, the prediction accuracy (correct answer rate of prediction) is It was about the same. Further, in the verification by the inventors, the processing speed of the information processing apparatus 1 according to the first embodiment is predicted by the prediction unit 21 in which the decompression process of the grammar-based compressed character string D1 and the decompressed grammar-based compressed character string D1 are input. The processing could be completed at a processing speed of about 3 times that of the total time of the processing.
 ビックデータを扱うシステムでは、処理対象のビックデータは、圧縮された状態でストレージに格納されていることが多い。このようなビックデータに対する予測処理を行う場合、例えば、特許文献1に記載の技術を用いた場合、ストレージからのデータ読み出し、読み出したデータの解凍、解凍後にさらに圧縮処理したと後に予測するという順に処理をしなければならない。つまり、特許文献1に記載の技術を用いてストレージ上の圧縮データに対して予測処理する場合、非圧縮データに対する予測処理を行う場合よりも多くの時間を要する問題が生じる。これに対たいして、実施の形態1にかかる情報処理装置1では、ストレージ上に圧縮された状態で保存されているビックデータに対して解凍処理を行うことなく予測処理を行うことができるため、予測処理の高速化の効果が非常に大きくなる。 In a system that handles big data, the big data to be processed is often stored in storage in a compressed state. When performing prediction processing on such big data, for example, when the technique described in Patent Document 1 is used, data is read from the storage, the read data is decompressed, and it is predicted that further compression processing is performed after decompression. Must be processed. That is, when the prediction processing for the compressed data on the storage is performed by using the technique described in Patent Document 1, there arises a problem that it takes more time than the prediction processing for the uncompressed data. On the other hand, in the information processing apparatus 1 according to the first embodiment, the prediction process can be performed on the big data stored in the compressed state on the storage without performing the decompression process. The effect of speeding up the processing becomes very large.
 実施の形態2
 実施の形態2では、実施の形態1にかかる情報処理装置1の別の形態となる情報処理装置2について説明する。なお、実施の形態2の説明では、実施の形態1と同じ構成要素については実施の形態1と同じ符号を付して説明を省略する。
Embodiment 2
In the second embodiment, the information processing device 2 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the second embodiment, the same components as those of the first embodiment are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.
 図3に実施の形態2にかかる情報処理装置2のブロック図を示す。図3に示すように、実施の形態2にかかる情報処理装置2は、実施の形態1にかかる情報処理装置1に依存関係抽出部13を追加したものである。依存関係抽出部13は、入力部10により読み込まれた文法圧縮文字列D1を構成するファクタ間の依存関係を抽出する。そして、実施の形態2では、文字列特徴ベクトル生成部20が、依存関係抽出部13が抽出した依存関係に基づき文字列特徴ベクトルを生成する。 FIG. 3 shows a block diagram of the information processing device 2 according to the second embodiment. As shown in FIG. 3, the information processing device 2 according to the second embodiment is obtained by adding the dependency extraction unit 13 to the information processing device 1 according to the first embodiment. The dependency extraction unit 13 extracts the dependency relationships between the factors constituting the grammar-based compressed character string D1 read by the input unit 10. Then, in the second embodiment, the character string feature vector generation unit 20 generates the character string feature vector based on the dependency relationships extracted by the dependency extraction unit 13.
 ここで、実施の形態2にかかる情報処理装置2において依存関係抽出部13が抽出した依存関係と、当該依存関係を用いた文字列特徴ベクトル生成部20の動作について具体例を用いて説明を行う。そこで、図4に実施の形態2にかかる情報処理装置における処理の具体例の1つを説明する図を示す。なお、図4に示す例は、図2で示した文字列の例に対して実施の形態2の依存関係抽出部13を用いた依存関係の抽出を行ったものである。 Here, the dependency extracted by the dependency extraction unit 13 in the information processing apparatus 2 according to the second embodiment and the operation of the character string feature vector generation unit 20 using the dependency will be described with reference to specific examples. .. Therefore, FIG. 4 shows a diagram illustrating one specific example of processing in the information processing apparatus according to the second embodiment. In the example shown in FIG. 4, the dependency is extracted by using the dependency extraction unit 13 of the second embodiment for the example of the character string shown in FIG.
 図4に示す例では、ファクタf3がファクタf2の要素を含み、ファクタf4、f5がファクタf3の要素を含む。つまり、ファクタf3がファクタf2と依存関係を有し、ファクタf4、f5がファクタf3と依存関係を有する。 In the example shown in FIG. 4, the factor f3 includes the element of the factor f2, and the factors f4 and f5 include the element of the factor f3. That is, the factor f3 has a dependency relationship with the factor f2, and the factors f4 and f5 have a dependency relationship with the factor f3.
 そして、上述した依存関係に基づき、実施の形態2では、文字列特徴ベクトル生成部20は、ファクタf2に対応する文字列特徴ベクトルvf2と文字Tに対応する文字特徴ベクトルv(T)とを結合器Mで結合してファクタf3に対応する文字列特徴ベクトルvf3を生成する。 Then, based on the above-mentioned dependency relationship, in the second embodiment, the character string feature vector generation unit 20 combines the character string feature vector vf2 corresponding to the factor f2 and the character feature vector v (T) corresponding to the character T. The character string feature vector vf3 corresponding to the factor f3 is generated by combining with the device M.
 文字列特徴ベクトル生成部20は、ファクタf3に対応する文字列特徴ベクトルvf3と文字Tに対応する文字特徴ベクトルv(T)とを結合器Mで結合してファクタf4に対応する文字列特徴ベクトルvf4を生成する。 The character string feature vector generation unit 20 combines the character string feature vector vf3 corresponding to the factor f3 and the character feature vector v (T) corresponding to the character T with the combiner M, and the character string feature vector corresponding to the factor f4. Generate vf4.
 文字列特徴ベクトル生成部20は、ファクタf3に対応する文字列特徴ベクトルvf3と文字Cに対応する文字特徴ベクトルv(C)とを結合器Mで結合してファクタf5に対応する文字列特徴ベクトルvf5を生成する。 The character string feature vector generation unit 20 combines the character string feature vector vf3 corresponding to the factor f3 and the character feature vector v (C) corresponding to the character C with the combiner M, and the character string feature vector corresponding to the factor f5. Generate vf5.
 実施の形態2にかかる情報処理装置2では、依存関係抽出部13によりファクタ間の依存関係を抽出することで、結合器Mで結合するベクトル値の組み合わせを簡略化することができる。これにより、実施の形態2にかかる情報処理装置2では、実施の形態1にかかる情報処理装置1よりも高速に予測処理を行うことができる。 In the information processing apparatus 2 according to the second embodiment, the combination of vector values to be combined by the combiner M can be simplified by extracting the dependency between the factors by the dependency extraction unit 13. As a result, the information processing device 2 according to the second embodiment can perform the prediction processing at a higher speed than the information processing device 1 according to the first embodiment.
 実施の形態3
 実施の形態3では、実施の形態1にかかる情報処理装置1の別の形態となる情報処理装置3について説明する。なお、実施の形態3の説明では、実施の形態1、2と同じ構成要素については実施の形態1と同じ符号を付して説明を省略する。
Embodiment 3
In the third embodiment, the information processing device 3 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the third embodiment, the same components as those of the first and second embodiments are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.
 図5に実施の形態3にかかる情報処理装置3のブロック図を示す。図5に示すように、実施の形態3にかかる情報処理装置3は、実施の形態2にかかる情報処理装置2に損失計算部14及びパラメータ学習部15を追加したものである。損失計算部14は、予め準備した教師データD3と文法圧縮文字列D1を入力したときに予測処理部11から出力する予測値との差分値を出力する。パラメータ学習部15は、損失計算部14が出力した差分値が小さくなるように予測器パラメータを更新する。 FIG. 5 shows a block diagram of the information processing apparatus 3 according to the third embodiment. As shown in FIG. 5, the information processing apparatus 3 according to the third embodiment is obtained by adding a loss calculation unit 14 and a parameter learning unit 15 to the information processing apparatus 2 according to the second embodiment. The loss calculation unit 14 outputs a difference value between the teacher data D3 prepared in advance and the predicted value output from the prediction processing unit 11 when the grammar-based compressed character string D1 is input. The parameter learning unit 15 updates the predictor parameters so that the difference value output by the loss calculation unit 14 becomes small.
 上記説明より、実施の形態3にかかる情報処理装置3では、予測器パラメータを教師データD3に基づき更新することで、文法圧縮文字列D1を入力した際の予測値の正解率を向上させることができる。 From the above description, in the information processing apparatus 3 according to the third embodiment, the correct answer rate of the predicted value when the grammar-based compressed character string D1 is input can be improved by updating the predictor parameter based on the teacher data D3. it can.
 実施の形態4
 実施の形態4では、実施の形態1にかかる情報処理装置1の別の形態となる情報処理装置4について説明する。なお、実施の形態4の説明では、実施の形態1、2と同じ構成要素については実施の形態1と同じ符号を付して説明を省略する。
Embodiment 4
In the fourth embodiment, the information processing device 4 which is another form of the information processing device 1 according to the first embodiment will be described. In the description of the fourth embodiment, the same components as those of the first and second embodiments are designated by the same reference numerals as those of the first embodiment, and the description thereof will be omitted.
 図6に実施の形態4にかかる情報処理装置4のブロック図を示す。図6に示すように、実施の形態4にかかる情報処理装置4では、情報処理装置1の予測処理部11に代えて予測処理部16を有する。また、予測処理部16は、グルーピング部30、文字列特徴ベクトル生成部31、予測部21を有する。 FIG. 6 shows a block diagram of the information processing apparatus 4 according to the fourth embodiment. As shown in FIG. 6, the information processing device 4 according to the fourth embodiment has a prediction processing unit 16 instead of the prediction processing unit 11 of the information processing device 1. Further, the prediction processing unit 16 includes a grouping unit 30, a character string feature vector generation unit 31, and a prediction unit 21.
 グルーピング部30は、依存関係抽出部13が抽出した文法圧縮文字列D1のファクタ間の依存関係に基づき、ファクタ毎に依存関係の段数を算出し、同一の段数のファクタをそれぞれグループ化する。文字列特徴ベクトル生成部31は、グルーピング部30がグループ化した結果に基づき、同一グループに属するファクタについては、文字列特徴ベクトルの算出を並列して行う。そして、予測処理部16では、文字列特徴ベクトル生成部31が生成した文字列特徴ベクトルを予測部21に入力して、予測値を得る。 The grouping unit 30 calculates the number of stages of the dependency for each factor based on the dependency between the factors of the grammar-based compressed character string D1 extracted by the dependency extraction unit 13, and groups the factors having the same number of stages. The character string feature vector generation unit 31 calculates the character string feature vector in parallel for the factors belonging to the same group based on the result of grouping by the grouping unit 30. Then, the prediction processing unit 16 inputs the character string feature vector generated by the character string feature vector generation unit 31 into the prediction unit 21 to obtain a prediction value.
 ここで、グルーピング部30のグループ化処理について説明する。そこで、図7に実施の形態4にかかる情報処理装置4におけるグループ化処理の具体例の1つを説明する図を示す。なお、図7は、図2で示した文字列の例と同じ文字列を例とするものである。 Here, the grouping process of the grouping unit 30 will be described. Therefore, FIG. 7 shows a diagram illustrating one specific example of the grouping process in the information processing apparatus 4 according to the fourth embodiment. Note that FIG. 7 is an example of the same character string as the example of the character string shown in FIG.
 図7に示す例では、ファクタf3がファクタf2に依存する。また、ファクタf4、f5がファクタf3、f2に依存する。一方、ファクタf1、f2、f6、f7は、他のファクタに依存することなく文字列特徴ベクトルを算出することができる。そこで、グルーピング部30は、他のファクタに依存しない文字列特徴ベクトルvf1、vf2、vf6、vf7を依存段数0のグループ1に分類する。また、グルーピング部30は、依存段数が1段となるファクタf3に対応する文字列特徴ベクトルvf3をグループ2に分類する。また、グルーピング部30は、依存段数が2段となるファクタf4、f5に対応する文字列特徴ベクトルvf4、vf5をグループ3に分類する。 In the example shown in FIG. 7, the factor f3 depends on the factor f2. Further, the factors f4 and f5 depend on the factors f3 and f2. On the other hand, the factors f1, f2, f6, and f7 can calculate the character string feature vector without depending on other factors. Therefore, the grouping unit 30 classifies the character string feature vectors vf1, vf2, vf6, and vf7, which do not depend on other factors, into group 1 having 0 dependent stages. Further, the grouping unit 30 classifies the character string feature vector vf3 corresponding to the factor f3 in which the number of dependent stages is one into group 2. Further, the grouping unit 30 classifies the character string feature vectors vf4 and vf5 corresponding to the factors f4 and f5 having two dependent stages into group 3.
 そして、実施の形態4では、文字列特徴ベクトル生成部31において、グループ毎に文字列特徴ベクトルの算出を並列して行う。 Then, in the fourth embodiment, the character string feature vector generation unit 31 calculates the character string feature vector in parallel for each group.
 上記説明より、実施の形態4にかかる情報処理装置4は、文字列特徴ベクトルの算出を並列処理により行うことで、実施の形態1にかかる情報処理装置1よりも予測処理を高速化することができる。 From the above description, the information processing device 4 according to the fourth embodiment can perform the prediction processing faster than the information processing device 1 according to the first embodiment by calculating the character string feature vector by parallel processing. it can.
 なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 The present invention is not limited to the above embodiment, and can be appropriately modified without departing from the spirit.
 上述の例において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体(non-transitory computer readable medium)を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体(tangible storage medium)を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体(例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ)、光磁気記録媒体(例えば光磁気ディスク)、CD-ROM(Read Only Memory)、CD-R、CD-R/W、半導体メモリ(例えば、マスクROM、PROM(Programmable ROM)、EPROM(Erasable PROM)、フラッシュROM、RAM(Random Access Memory))を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体(transitory computer readable medium)によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In the above example, the program can be stored and supplied to a computer using various types of non-transitory computer readable medium. Non-temporary computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, It includes a CD-R / W and a semiconductor memory (for example, a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, and a RAM (RandomAccessMemory)). The program may also be supplied to the computer by various types of temporary computer readable medium. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
 1~4 情報処理装置
 10 入力部
 11 予測処理部
 12 出力部
 13 依存関係抽出部
 14 損失計算部
 15 パラメータ学習部
 16 予測処理部
 20、31 文字列特徴ベクトル生成部
 21 予測部
 30 グルーピング部
 D1 文法圧縮文字列
 D2 パラメータデータベース
 D3 教師データ
1 to 4 Information processing device 10 Input unit 11 Prediction processing unit 12 Output unit 13 Dependency extraction unit 14 Loss calculation unit 15 Parameter learning unit 16 Prediction processing unit 20, 31 Character string feature vector generation unit 21 Prediction unit 30 Grouping unit D1 Grammar Compressed string D2 Parameter database D3 Teacher data

Claims (8)

  1.  文法圧縮を施して生成される圧縮データに含まれる文字列に対応するベクトル値の算出に用いる結合器の動作設定値である結合器パラメータと、予測器パラメータとを少なくとも格納するパラメータデータベースと、
     前記圧縮データから文法圧縮ルールを解釈し、前記文法圧縮ルールと前記結合器パラメータを用いて前記圧縮データに含まれる文字列に対応する文字列特徴ベクトルを生成して、前記予測器パラメータを用いて当該文字列特徴ベクトルから前記文字列の予測値を出力する予測処理部と、
     を有する情報処理装置。
    A parameter database that stores at least the combiner parameters, which are the operation setting values of the combiner used to calculate the vector value corresponding to the character string included in the compressed data generated by grammar-based compression, and the predictor parameters.
    The grammar-based compression rule is interpreted from the compressed data, the character string feature vector corresponding to the character string included in the compressed data is generated by using the grammar-based compression rule and the combiner parameter, and the predictor parameter is used. A prediction processing unit that outputs the predicted value of the character string from the character string feature vector, and
    Information processing device with.
  2.  前記予測処理部は、
     前記パラメータデータベースに格納された文字特徴ベクトルを前記結合器パラメータに基づき結合して前記文字列特徴ベクトルを生成する文字列特徴ベクトル生成部と、
     前記文字列特徴ベクトルを入力として前記文字列の予測値を出力する予測器と、
     を有する請求項1に記載の情報処理装置。
    The prediction processing unit
    A character string feature vector generation unit that generates the character string feature vector by combining the character feature vectors stored in the parameter database based on the combiner parameters.
    A predictor that outputs the predicted value of the character string by inputting the character string feature vector, and
    The information processing apparatus according to claim 1.
  3.  前記文字列を構成するファクタ間の依存関係を抽出する依存関係抽出部をさらに有し、
     前記文字列特徴ベクトル生成部は、前記依存関係を有する他のファクタのベクトル値と前記文字特徴ベクトルとを結合して前記文字列特徴ベクトルを生成する請求項2に記載の情報処理装置。
    It also has a dependency extraction unit that extracts the dependencies between the factors that make up the character string.
    The information processing apparatus according to claim 2, wherein the character string feature vector generation unit combines vector values of other factors having a dependency relationship with the character feature vector to generate the character string feature vector.
  4.  前記予測処理部は、前記ファクタ間の依存関係に基づき、前記ファクタ毎に前記依存関係の段数を算出し、同一の段数の前記ファクタをそれぞれグループ化するグルーピング部をさらに有し、
     前記文字列特徴ベクトル生成部は、前記グループ化された前記ファクタに対しては並列して前記文字列特徴ベクトルの算出を行う請求項3に記載の情報処理装置。
    The prediction processing unit further has a grouping unit that calculates the number of stages of the dependency for each factor based on the dependency between the factors and groups the factors of the same number of stages.
    The information processing apparatus according to claim 3, wherein the character string feature vector generation unit calculates the character string feature vector in parallel with respect to the grouped factors.
  5.  前記予測処理部が出力した前記予測値と、前記予測値を算出する際に入力された文字列に対応する教師データと、を比較して、前記予測値と前記教師データとの差分値を出力する損失計算部と、
     前記差分値が小さくなるように前記予測器パラメータを更新するパラメータ学習部と、を有する請求項1乃至4のいずれか1項に記載の情報処理装置。
    The predicted value output by the prediction processing unit is compared with the teacher data corresponding to the character string input when calculating the predicted value, and the difference value between the predicted value and the teacher data is output. Loss calculation unit and
    The information processing apparatus according to any one of claims 1 to 4, further comprising a parameter learning unit that updates the predictor parameters so that the difference value becomes smaller.
  6.  前記予測処理部は、ニューラルネットワークを用いて前記文字列の予測値の出力を行う請求項1乃至5のいずれか1項に記載の情報処理装置。 The information processing device according to any one of claims 1 to 5, wherein the prediction processing unit outputs a predicted value of the character string using a neural network.
  7.  コンピュータを用いて文字列に対して文法圧縮処理を施して生成される圧縮データに含まれる前記文字列を予測する文法圧縮後の文字列予測方法であって、
     文字列に対して文法圧縮を施して生成される圧縮データを入力し、
     前記圧縮データから前記文法圧縮のルールを解釈して、前記圧縮データに含まれる前記文字列に文字特徴ベクトルを適用して文字列特徴ベクトルを生成し、
     予測器パラメータを用いて、前記文字列特徴ベクトルが示す前記文字列の予測値を出力する文法圧縮後の文字列の予測方法。
    It is a character string prediction method after grammar-based compression that predicts the character string included in the compressed data generated by performing grammar-based compression processing on the character string using a computer.
    Enter the compressed data generated by applying grammar-based compression to the character string,
    The rule of the grammar-based code is interpreted from the compressed data, and the character feature vector is applied to the character string included in the compressed data to generate the character string feature vector.
    A method of predicting a character string after grammar-based code that outputs a predicted value of the character string indicated by the character string feature vector using a predictor parameter.
  8.  文字列に対して文法圧縮処理を施して生成される圧縮データに含まれる前記文字列をコンピュータの演算機能を用いて予測するプログラムが格納されたコンピュータ可読媒体であって、
     前記プログラムは、
     文字列に対して文法圧縮を施して生成される圧縮データを読み込み、
     前記圧縮データから前記文法圧縮のルールを解釈して、前記圧縮データに含まれる前記文字列に文字特徴ベクトルを適用して文字列特徴ベクトルを生成し、
     予測器パラメータを用いて、前記文字列特徴ベクトルが示す前記文字列の予測値を出力するプログラムが格納されたコンピュータ可読媒体。
    A computer-readable medium containing a program that predicts the character string contained in the compressed data generated by performing grammar-based compression processing on the character string using a computer's arithmetic function.
    The program
    Read the compressed data generated by applying grammar compression to the character string,
    The rule of the grammar-based code is interpreted from the compressed data, and the character feature vector is applied to the character string included in the compressed data to generate the character string feature vector.
    A computer-readable medium containing a program that outputs a predicted value of the character string indicated by the character string feature vector using predictor parameters.
PCT/JP2020/000629 2020-01-10 2020-01-10 Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor WO2021140639A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/000629 WO2021140639A1 (en) 2020-01-10 2020-01-10 Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/000629 WO2021140639A1 (en) 2020-01-10 2020-01-10 Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor

Publications (1)

Publication Number Publication Date
WO2021140639A1 true WO2021140639A1 (en) 2021-07-15

Family

ID=76787784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/000629 WO2021140639A1 (en) 2020-01-10 2020-01-10 Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor

Country Status (1)

Country Link
WO (1) WO2021140639A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114095390A (en) * 2021-11-11 2022-02-25 北京百度网讯科技有限公司 Method and device for predicting object flow in area, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09246987A (en) * 1996-03-04 1997-09-19 Toshiba Advanced Syst Kk Data compression device and data retrieval device whose object is data compressed in the compression device
JP2016212742A (en) * 2015-05-12 2016-12-15 日本電信電話株式会社 Data analysis device, method, program, and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09246987A (en) * 1996-03-04 1997-09-19 Toshiba Advanced Syst Kk Data compression device and data retrieval device whose object is data compressed in the compression device
JP2016212742A (en) * 2015-05-12 2016-12-15 日本電信電話株式会社 Data analysis device, method, program, and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MEND, HANG HO ET AL.: "A Dictionary-based Compressed Pattern Matching Algorithm", PROCEEDINGS 26TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS, 26 August 2002 (2002-08-26), XP010611224, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1045116> *
PEREZ, A. CARLOS ET AL.: "Approximate Searching on Compressed Text", PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND COMPUTERS (CONIELECOMP'05, 28 February 2005 (2005-02-28), XP010820841, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1488570> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114095390A (en) * 2021-11-11 2022-02-25 北京百度网讯科技有限公司 Method and device for predicting object flow in area, electronic equipment and storage medium
CN114095390B (en) * 2021-11-11 2023-10-13 北京百度网讯科技有限公司 Method, device, equipment and storage medium for predicting flow of objects in area

Similar Documents

Publication Publication Date Title
CN109344391B (en) Multi-feature fusion Chinese news text abstract generation method based on neural network
CN108346436B (en) Voice emotion detection method and device, computer equipment and storage medium
US8943006B2 (en) Automaton determinization method, device, and computer program product that involves a plurality of states and deleting states that are next states to first transitions
US9934452B2 (en) Pruning and label selection in hidden Markov model-based OCR
CN113704476B (en) Target event extraction data processing system
CN110555510A (en) Method for compressing pre-trained deep neural network model
CN113468433B (en) Target event extraction data processing system
CN110673840A (en) Automatic code generation method and system based on tag graph embedding technology
CN113889076B (en) Speech recognition and coding/decoding method, device, electronic equipment and storage medium
CN113722461B (en) Target event extraction data processing system
GB2610978A (en) Unsupervised text summarization with reinforcement learning
WO2021140639A1 (en) Information processing device, method for predicting character string after grammatical compression, and computer-readable medium storing program therefor
CN115831102A (en) Speech recognition method and device based on pre-training feature representation and electronic equipment
CN113488028A (en) Speech transcription recognition training decoding method and system based on rapid skip decoding
CN114841282A (en) Training method of pre-training model, and generation method and device of solution model
CN113689868B (en) Training method and device of voice conversion model, electronic equipment and medium
CN113963715A (en) Voice signal separation method and device, electronic equipment and storage medium
US11715461B2 (en) Transformer-based automatic speech recognition system incorporating time-reduction layer
CN113641829A (en) Method and device for training neural network of graph and complementing knowledge graph
JP2021039220A (en) Speech recognition device, learning device, speech recognition method, learning method, speech recognition program, and learning program
CN110767217A (en) Audio segmentation method, system, electronic device and storage medium
CN116312502A (en) End-to-end stream type voice recognition method and device based on sequential sampling blocking mechanism
CN115391512A (en) Training method, device, equipment and storage medium of dialogue language model
CN115589446A (en) Meeting abstract generation method and system based on pre-training and prompting
CN115759209A (en) Neural network model quantification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912016

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912016

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP