WO2001006660A1 - Procede de reduction de quantite de donnees et procede d'elaboration de quantite de donnees reduite - Google Patents

Procede de reduction de quantite de donnees et procede d'elaboration de quantite de donnees reduite Download PDF

Info

Publication number
WO2001006660A1
WO2001006660A1 PCT/JP2000/004756 JP0004756W WO0106660A1 WO 2001006660 A1 WO2001006660 A1 WO 2001006660A1 JP 0004756 W JP0004756 W JP 0004756W WO 0106660 A1 WO0106660 A1 WO 0106660A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
extracting
predetermined
blocks
same
Prior art date
Application number
PCT/JP2000/004756
Other languages
English (en)
Japanese (ja)
Inventor
Kouki Hara
Motoshi Kimura
Original Assignee
Vertex Software Co.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vertex Software Co. filed Critical Vertex Software Co.
Priority to AU60174/00A priority Critical patent/AU6017400A/en
Publication of WO2001006660A1 publication Critical patent/WO2001006660A1/fr

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/42Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code using table look-up for the coding or decoding process, e.g. using read-only memory

Definitions

  • the present invention relates to a data reduction method and a data reduction system, and more particularly, to a data reduction method and a data reduction system in which the amount of data to be transmitted is reduced to improve transmission efficiency.
  • a digital content distribution system is a software system that manages a data recording device that can record digital data through information transmission means such as a satellite system or a line.
  • information transmission means such as a satellite system or a line.
  • the content distribution means transmits the request.
  • a collation code for collating the identification code with the time-axis-compressed digital data of the content software corresponding to the request is delivered to the data recording device through the information transmission means.
  • This data recording device expands the digital data and outputs it to the output device only when the received collation code is matched with its own identification code.
  • the decompression device is characterized by expanding the digital data compressed and recorded in the data recording device and reproducing the content software.
  • the information processing device and the communication method are communication methods between the first and second information processing devices connected by a communication network, wherein the first information processing device transmits the second information processing device to the second information processing device.
  • a receiving step of receiving the data transmitted in the transmitting step in the first information processing apparatus by adding Restoring and storing this.
  • Many of these conventional systems employ compression techniques, which require the software required for compression when decompressing, and have a compression ratio of about 10%.
  • this compression method requires the same software for compression and decompression, and software must be used for both compression and decompression, and both decompression and compression are performed by the user. .
  • the conventional compression method has a problem that the transmission efficiency cannot be improved because the software compression ratio is several tens of percent.
  • a data compression method for compressing data that has been reduced by a data reduction method, a data reduction system, and a data reduction method according to the present invention, and a method for compressing compressed data
  • the data reduction method, the data transmission system and the data recording system for reducing the capacity are to be configured as shown below.
  • the search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in an arbitrary order and collated.
  • the data reduction method described in (1) is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in an arbitrary order and collated.
  • the search for extracting the same part is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and matching is performed on data obtained by connecting any number of the data blocks.
  • the data reduction method according to (1) is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and matching is performed on data obtained by connecting any number of the data blocks.
  • the data to be searched is divided into data blocks of an arbitrary size, and a higher priority is given to a data block having a large number of reference appearances.
  • the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and appearance frequency of data blocks of the same data among the data blocks are obtained.
  • the predetermined mathematical formula is compared with data prepared from a pattern prepared in advance, and selects a familiar mathematical formula.
  • the data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting the data blocks in an arbitrary order.
  • the search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and matching is performed on data obtained by connecting any number of the data blocks.
  • a small-capacity data generation system according to (5) In the search for extracting the same portion, the data to be searched is divided into data blocks of an arbitrary size, and the priority of the data block having a higher number of reference appearances is set higher than that of the data block.
  • the small-capacity data generation system according to (15) characterized in that:
  • the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and appearance frequency of data blocks of the same data in the data blocks are obtained.
  • the search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in an arbitrary order and collated with data.
  • a data compression method for compressing data that has been reduced by the data reduction method described in (29).
  • the search for extracting the same part is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and matching is performed on data obtained by connecting the data blocks in an arbitrary number.
  • the data to be searched is divided into data blocks of an arbitrary size, and the A data compression method for compressing data reduced in size by the data reduction method described in (29), wherein the priority is set higher.
  • the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and the appearance frequency of the data blocks of the same data in the data blocks are determined.
  • the sequence of a predetermined length has a length that can be easily applied to the mathematical formula, and compresses data reduced in size by the data reduction method according to (29). Data compression method.
  • the predetermined mathematical formula is compared with a graph prepared in advance, and a familiar mathematical formula is selected, and the predetermined mathematical formula is reduced by the data reduction method according to (29) or (40).
  • a data compression method that compresses the volumeized data.
  • Predetermined source data is compressed by a predetermined method, the compressed data is expanded from a predetermined data length until the data no longer matches, and the longest identical part is extracted, and the extracted longest identical part is extracted. Starting with the data length to be excluded, the data length is expanded until the data no longer matches, and the next longest identical part is repeatedly extracted. The extracted identical part and the remaining data are composed of the predetermined length. Converting the converted data to a predetermined mathematical expression using the converted data as a parameter, converting the converted data into a predetermined mathematical expression. Method to reduce data capacity.
  • the search for extracting the same part is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in any order and collated. (43) A data reduction method for reducing the compressed data volume described in (43).
  • the data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting the data blocks in an arbitrary number.
  • the data to be searched is The data block is divided into data blocks, and priority is given to a data block having a large number of reference appearances, wherein the compressed data described in (43) is reduced in data volume.
  • the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and the appearance frequency of the data blocks of the same data in the data blocks are obtained.
  • the predetermined mathematical formula is a pattern data which is prepared in advance and will be generated.
  • (61) Starting from the specified source data with the specified data length until the data no longer matches Means for extracting the longest identical part by expanding the data, and extracting the next longest identical part by expanding the data starting from a predetermined data length, excluding the extracted longest identical part, until the data no longer matches. Means for repeatedly performing the same operation, means for converting the extracted identical part and the remaining data into a sequence having a predetermined length, and comparing the converted sequence with the pattern data which will generate the sequence. Means for converting the compared difference into a predetermined mathematical expression as a parameter; and recording the reduced capacity data converted into the predetermined mathematical expression on a predetermined recording medium and reducing the recorded reduced capacity.
  • a small-capacity data reproduction system comprising reproduction means capable of reproducing data appropriately.
  • (63) Means for extracting the longest identical portion by extending the predetermined source data from the predetermined byte length until the data no longer matches, and extracting the data excluding the extracted longest identical portion to the predetermined data length Means to repeatedly expand data until the data no longer matches and extract the next longest identical part, and convert the extracted identical part and the remaining data into a sequence of predetermined length Means for performing the conversion, the means for comparing the converted sequence with the patterned data that will be generated, the means for converting the compared difference into a predetermined mathematical expression as a parameter, and the small value converted to the predetermined mathematical expression.
  • a small-capacity data reproduction system comprising reproduction means capable of appropriately reproducing the large-capacity data.
  • FIG. 1 is an overall flow chart for realizing the data reduction method of the present invention.
  • FIG. 2 is an overall block diagram for realizing the data reduction method.
  • FIG. 3 is a block diagram showing a block generation / rule generation unit and a coding calculation unit that constitute the capacity reduction method.
  • FIG. 4 is an explanatory diagram showing a method of searching for the same block.
  • FIG. 5 is an explanatory diagram showing a method for searching for an approximate block.
  • Fig. 6 is a conceptual diagram of usage frequency measurement.
  • FIG. 7 is a conceptual diagram for detecting unique data.
  • FIG. 8 is a conceptual diagram for generating reduced-volume data using mathematical formulas. BEST MODE FOR CARRYING OUT THE INVENTION
  • a system for reducing data capacity includes, as shown in FIG. 1, source data 100 such as video data, music data, etc.
  • the unique conversion data 210 is created by performing the conversion process, the unique conversion data 210 is converted into a predetermined mathematical expression to generate the mathematical expression conversion data 220, and the data amount is reduced for transmission. This is to create the reduced capacity data 300.
  • the unique conversion data 210 and the mathematical expression conversion data 220 constitute a recording calculation unit 200.
  • the created small-capacity data 300 is compressed and transmitted by a well-known compression method, or the source data 100 is compressed.
  • this small-capacity data 300 is transmitted, or the source data is compressed into reduced-capacity data and then compressed again by a compression method to create reduced-capacity data.
  • this small-capacity data 300 is transmitted to a desired destination by radio wave, light, or electric communication, or CD-RO It is configured to be recordable on a recording medium composed of a disk-shaped medium such as M, a tape-shaped medium such as DAT or DDS, or a recording medium composed of an IC memory.
  • the recording medium is not limited to these, and includes, for example, using optical, magnetic, physical and chemical recording media. Further, it is needless to say that the recorded small-volume data can be reproduced and used.
  • the data reduction system having such a feature includes a unique conversion data generation unit 210 that converts source data 100 into a predetermined sequence to generate unique conversion data, Formula conversion data generation unit 220 that converts the unique conversion data into meaningful data and converts it into mathematical formulas to generate formula conversion data, unique conversion data generation unit 210, and formula conversion data generation
  • the section 220 is composed of a block generation and rule generation section 400 for extracting the characteristics of the data itself and performing block processing and rule processing to reduce the data amount.
  • the block generation and rule generation unit 400 includes a common part extraction unit 410 for extracting a common part of data, a difference extraction unit 420, and a coding calculation unit 2 It consists of 0 0.
  • the common part extraction unit 4110 is divided into the same part extraction unit 411 that extracts the same part of the data, the approximate part extraction unit 412 that extracts the approximate part of the data, and the priority based on the frequency of use of the same part. And a data weighting unit 4 13 for weighting the data by adding a.
  • the same part extraction unit 4111 includes a longest same part extraction logic 4 14 and a different direction same part extraction logic 4 15.
  • the longest identical part extraction logic 4 1 4 is to extract the longest identical part of the data, starting from a predetermined data length, decompressing the data to be searched until it no longer matches, and extracting the longest Extract the same part. In other words, the largest identical part can be extracted, and the more identical parts, the smaller the amount of data to be sent.
  • the different direction same part extraction logic 4 15 searches from the reverse direction and extracts the same part.When searching in the forward direction, the data is compared not only in the forward direction but also in the reverse direction when matching the data. The collation is performed.
  • the approximated part extraction unit 412 extracts a common block including a dissimilar part and includes an out-of-order data detection logic 416 and a different data length detection logic 41 7 It is composed of
  • the out-of-order data detection logic 416 detects a portion that uses the same data column in a different order, divides the data to be searched into data blocks of an arbitrary size, and divides the divided data blocks into arbitrary data blocks. The collation is performed on the data concatenated in order.
  • the different data length detection logic 417 detects a portion including the same data string and having a different total data length, divides the search target data into data blocks of an arbitrary size, and divides the divided data blocks. Performs collation on data linked in any number.
  • Fig. 5 is a conceptual diagram of the approximate partial extraction. First, when there is a data block called "EFGH I ABC" 41, the approximate ones are “ABDEFGH I" 40 and “AABCDDDEFGH IJ 42". The common part of these is “EFGH l” 50 is the common block. By gathering approximate blocks having this common part, it is possible to set some rules there.
  • a predetermined method can be selected from the aggregate of approximate blocks without being limited to the processing of the same block.
  • a predetermined rule can be applied based on statistics and the like, and it becomes possible to reduce the capacity of data overnight.
  • the data weighting section 413 is composed of a use frequency measurement logic 418 for determining the priority order based on the frequency of the common block and a unique data detection logic 419 for processing data having no common block. Have been.
  • the usage frequency measurement logic 418 determines the priority based on the usage frequency, the reference block content, and the like, and divides the data to be searched into data blocks of an arbitrary size. Then, the number of appearances of the divided data blocks is counted.
  • the unique data detection logic 4 "I 9 measures the similarity of data and determines the priority, and divides the data to be searched into data blocks of arbitrary size. The appearance of the divided data blocks Determine the order and frequency of use.
  • FIG. 6 is a conceptual diagram of the usage frequency measurement logic.
  • “AJ data block is referred to by“ A 1 J, ”“ A 2 ”in source data 100
  • “ BJ data block is represented by ⁇ ⁇ If only “1” is referenced, “ ⁇ ” is referenced from two places, so the priority is higher than “BJ one place”.
  • Fig. 7 is a conceptual diagram of the unique data detection logic, which includes data blocks ⁇ HI '' and ⁇ ys '' that are common to source data 100, and therefore has low priority and does not include common blocks. Has a lower priority.
  • the difference extraction unit 420 includes a common part comparison unit 421 that compares the data of the blocks having the common part with each other, and a non-similar part of the block having the common part. It is composed of a difference comparison unit 4 22 which compares the differences.
  • the common part comparing section 421 compares the common parts to extract the difference, and divides and compares the dissimilar parts of the common block.
  • a repetitive block detection logic 4 2 4 for detecting repetition of the common block.
  • the division / comparison logic 4 2 3 divides the data into arbitrary data lengths and compares them with the blocks of the extracted common part, and divides the data to be searched into data blocks of arbitrary size. Then, the number included in the common part is measured.
  • the repetitive block detection logic 4 2 4 divides the data into arbitrary data lengths and detects continuity, and divides the data to be searched into data blocks of arbitrary size. Then, a continuous part in a specific common part is detected.
  • the difference comparison unit 422 calculates the similarity detection logic 425 that detects the similarity of the dissimilar part in the block including the common part and the content ratio of the common part that includes the dissimilar part with respect to the common part. It consists of common part content ratio measurement logic 4 26 for measurement.
  • the similarity detection logic 425 determines the priority based on the similarity between the difference data, and measures the similarity in the same manner as the common part to determine the priority.
  • Common part included The ratio measurement logic 426 corrects the priority based on the ratio included in the common part, measures the frequency of use of the common part, and corrects the priority.
  • the coding calculation unit 200 includes a unique conversion data generation unit 210 that uniquely converts the data that has been blocked and ruled by the block generation and rule generation unit 400 described above, and a predetermined mathematical expression. And a mathematical expression conversion data generation unit 220 for conversion.
  • the unique conversion data generation unit 210 converts the common part of the coded and ruled data into a sequence of arbitrary length. Convert to
  • the mathematical expression conversion data generation unit 220 generates predetermined data by applying a predetermined mathematical expression to the data composed of the sequence created by the unique conversion data generation unit 210 described above. Based on the fluctuation of the sequence generated by the unique conversion data generation unit 210, a predetermined sequence of appearing and disappearing trends is detected, compared with the sequence data of the tendency prepared in advance, and the most common formula is selected. I do. Then, the difference from the pattern drawn by the selected formula is converted into a formula as a parameter.
  • FIG. 8 is a conceptual diagram for generating reduced-volume data by using a mathematical formula, wherein “A BD 8 W
  • the source data 100 is uniquely converted into a predetermined sequence to make it meaningful data, and at the same time, the statistical properties of the data are grasped by blocking and making rules, and the data
  • the statistical properties of the data are grasped by blocking and making rules, and the data
  • grasping patterns and formulating the grasped patterns as parameters to generate reduced-volume data all source data can be used as common blocks, trend parameters, and rule-based data. This allows the data to be reduced without truncating the data that makes up the source data, so that the restoration can be completely restored.
  • the predetermined data according to the present invention is expanded from the predetermined byte length until the data no longer matches, and the longest identical portion is extracted, and the extracted longest identical portion is removed.
  • Repeat the process of extracting the next longest identical part by expanding the data starting from the specified byte length until the data no longer matches, and converting the extracted identical part and the remaining data into a sequence of predetermined length By converting the converted sequence into a mathematical expression by using as a parameter the patternized data that would be generated in advance, the data to be transmitted can be converted into predetermined meaningful data. This has the effect of improving the transmission efficiency by reducing the amount of data that is actually transmitted in entanglement.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne une opération visant à effectuer la décompression de données source prédéterminées jusqu'au moment où la concordance des données n'est plus assurée à partir d'une longueur de données prédéterminée, de manière à extraire la plus longue partie concordante. Ensuite, on effectue la décompression des données source moins la partie concordante la plus longue extraite, jusqu'au moment où la concordance des données n'est plus assurée à partir d'une longueur de données prédéterminée, de manière à extraire la plus longue partie concordante suivante. On répète l'opération pour transformer la partie concordante extraite et l'autre partie des données en une séquence de nombres de longueur prédéterminée. Ensuite, on compare ladite séquence avec la configuration de données attendue, et l'écart résultant de la comparaison est converti en paramètre. On transforme enfin les données en une expression mathématique préétablie sur la base de ce paramètre, pour établir la quantité de données réduite.
PCT/JP2000/004756 1999-07-16 2000-07-14 Procede de reduction de quantite de donnees et procede d'elaboration de quantite de donnees reduite WO2001006660A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU60174/00A AU6017400A (en) 1999-07-16 2000-07-14 Amount-of-data reducing method and reduced amount-of-data generating system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP20241199 1999-07-16
JP11/202411 1999-07-16

Publications (1)

Publication Number Publication Date
WO2001006660A1 true WO2001006660A1 (fr) 2001-01-25

Family

ID=16457070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2000/004756 WO2001006660A1 (fr) 1999-07-16 2000-07-14 Procede de reduction de quantite de donnees et procede d'elaboration de quantite de donnees reduite

Country Status (2)

Country Link
AU (1) AU6017400A (fr)
WO (1) WO2001006660A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0129439A1 (fr) * 1983-06-20 1984-12-27 Sperry Corporation Appareil et méthode de compression et de décompression à grande vitesse de données
US5016009A (en) * 1989-01-13 1991-05-14 Stac, Inc. Data compression apparatus and method
JPH05152971A (ja) * 1991-11-29 1993-06-18 Fujitsu Ltd データ圧縮・復元方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0129439A1 (fr) * 1983-06-20 1984-12-27 Sperry Corporation Appareil et méthode de compression et de décompression à grande vitesse de données
US5016009A (en) * 1989-01-13 1991-05-14 Stac, Inc. Data compression apparatus and method
JPH05152971A (ja) * 1991-11-29 1993-06-18 Fujitsu Ltd データ圧縮・復元方法

Also Published As

Publication number Publication date
AU6017400A (en) 2001-02-05

Similar Documents

Publication Publication Date Title
US6191710B1 (en) Data compression and decompression method and system for data compression and decompression
Robinson SHORTEN: Simple lossless and near-lossless waveform compression
KR100527891B1 (ko) 허프만 디코딩을 수행하는 방법
EP1039645A1 (fr) Méthode et dispositif de compression de données
JP2007508753A (ja) データ圧縮システム及び方法
JP2007508753A5 (fr)
US5857036A (en) System and method for the fractal encoding of datastreams
JP2012506665A (ja) データ記録を圧縮し復元する方法及び装置
US6130631A (en) Method and apparatus utilizing a simplified content-addressable memory for JPEG decoding
US6400293B1 (en) Data compression system and method
WO2001006660A1 (fr) Procede de reduction de quantite de donnees et procede d'elaboration de quantite de donnees reduite
JPH04359315A (ja) データ圧縮制御装置及びデータ復元制御装置
JP2023064241A (ja) ストレージシステム及びストレージシステムにおけるデータ処理方法
JPH07111456A (ja) 音声圧縮方法および装置
JP2830697B2 (ja) データ処理装置
JPH09181610A (ja) パターン圧縮方法及び装置
JP3442105B2 (ja) データ圧縮および復元方式
US7378587B2 (en) Method for fast compressing and decompressing music data and system for executing the same
KR102016125B1 (ko) 데이터 압축 및 압축해제방법
JP3648931B2 (ja) 反復変換音声符号化方法および装置
JPH07336696A (ja) 2次元画像データの圧縮方式および伸長方式
JPH05244015A (ja) データの圧縮方式
Hidayat et al. Critical Understanding Performance of Huffman and Lempel Zip to Pattern Audio Data 16-bit
JP2002135128A (ja) データ圧縮方法、データ圧縮・伸長方法、データ圧縮装置及びデータ圧縮・伸長装置
JP3602224B2 (ja) テストパターン圧縮装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP