WO2001006660A1

WO2001006660A1 - Amount-of-data reducing method and reduced amount-of-data generating system

Info

Publication number: WO2001006660A1
Application number: PCT/JP2000/004756
Authority: WO
Inventors: Kouki Hara; Motoshi Kimura
Original assignee: Vertex Software Co.
Priority date: 1999-07-16
Filing date: 2000-07-14
Publication date: 2001-01-25
Also published as: AU6017400A

Abstract

Predetermined source data is decompressed until data does not agree starting with a predetermined data length to extract the longest agreeing part. The source data except the extracted longest agreeing part is decompressed until data does not agree starting with a predetermined length to extract the next longest agreeing part. Such operations are repeated to transform the extracted agreeing part and the other part of the data to a sequence of numbers having a predetermined length. The sequence of numbers is compared with patterned data expected to occur, and the difference is made a parameter. The data is transformed into a predetermined mathematical expression according to the parameter to generate and output data the amount of which is reduced.

Description

Description Data reduction method and data generation system

The present invention relates to a data reduction method and a data reduction system, and more particularly, to a data reduction method and a data reduction system in which the amount of data to be transmitted is reduced to improve transmission efficiency. . Background art

As a technique for compressing and distributing data in the prior art, for example, a digital content distribution system disclosed in Japanese Patent Application Laid-Open No. H11-149907 is disclosed in Japanese Patent Application Laid-Open No. Hei 7-152686. , An information processing apparatus and a distribution method.

A digital content distribution system is a software system that manages a data recording device that can record digital data through information transmission means such as a satellite system or a line. When a request for the content software code to be distributed and the identification code of the data recording device is transmitted to the content distribution means for compressing the time axis, converting the data into disc data, assigning a code, and storing the code, the content distribution means transmits the request. A collation code for collating the identification code with the time-axis-compressed digital data of the content software corresponding to the request is delivered to the data recording device through the information transmission means. This data recording device expands the digital data and outputs it to the output device only when the received collation code is matched with its own identification code. The decompression device is characterized by expanding the digital data compressed and recorded in the data recording device and reproducing the content software.

Further, the information processing device and the communication method are communication methods between the first and second information processing devices connected by a communication network, wherein the first information processing device transmits the second information processing device to the second information processing device. A requesting process for requesting data, and a second information processing device receiving the request in the requesting process. An acquisition step for acquiring data to be provided by the user, a compression step for compressing the data acquired in the acquisition step according to the degree of congestion in the communication network, and information indicating the compression state of software on the data compressed in the compression step. A receiving step of receiving the data transmitted in the transmitting step in the first information processing apparatus by adding Restoring and storing this. Many of these conventional systems employ compression techniques, which require the software required for compression when decompressing, and have a compression ratio of about 10%. In addition, this compression method requires the same software for compression and decompression, and software must be used for both compression and decompression, and both decompression and compression are performed by the user. .

However, the conventional compression method has a problem that the transmission efficiency cannot be improved because the software compression ratio is several tens of percent.

Therefore, when transmitting data, there is a problem to be solved how to improve the efficiency of data transmission. Disclosure of the invention

In order to solve the above-mentioned problems, a data compression method for compressing data that has been reduced by a data reduction method, a data reduction system, and a data reduction method according to the present invention, and a method for compressing compressed data, The data reduction method, the data transmission system and the data recording system for reducing the capacity are to be configured as shown below.

(1) Starting predetermined source data from a predetermined data length, expanding the data until the data no longer matches, extracting the longest identical part, and starting from the predetermined data length, excluding the extracted longest identical part from the predetermined data length It repeatedly expands the data until it no longer matches and extracts the next longest identical part, converts the extracted identical part and remaining data into a sequence of predetermined length, and generates the converted sequence. Data difference which is compared with patterned data which is likely to be used as a parameter, and is converted into a predetermined mathematical formula based on the parameter to generate and output the reduced capacity data. Method. (2) The method for reducing data capacity according to (1), wherein, in the search for extracting the same part, matching is performed from the forward direction and the backward direction at the same matching of the target data.

(3) The search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in an arbitrary order and collated. The data reduction method described in (1).

(4) The search for extracting the same part is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and matching is performed on data obtained by connecting any number of the data blocks. (1) The data reduction method according to (1).

(5) In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and a higher priority is given to a data block having a large number of reference appearances. (1) The method for reducing data capacity according to (1).

(6) In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and appearance frequency of data blocks of the same data among the data blocks are obtained. The method for reducing data capacity according to (1), wherein priorities are set by setting.

(7) After extracting the data block having the same part, the common data included in the data block is extracted and the number of the common data is measured, and the data size of the data block according to (1) is measured. Capacity method.

(8) After extracting the data block having the same part, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. (1) The method for reducing data capacity according to (1).

(9) After extracting the data blocks having the same part, priorities are set in descending order of use frequency of the same or similar data blocks among the extracted data blocks. The data reduction method according to (1), which is characterized in that:

(10) After extracting the data blocks having the same portion, the extracted data blocks are set in accordance with the frequency of use of the same or similar data blocks. (1) The method for reducing data capacity according to (1), wherein the priority order is corrected.

(11) The data reduction method according to (1), wherein the sequence of the predetermined length has a length that can be easily applied to the mathematical formula.

(1 2) The method for reducing data capacity according to (1), wherein the graphing is generated based on a change in the sequence.

(13) The predetermined mathematical formula is compared with data prepared from a pattern prepared in advance, and selects a familiar mathematical formula. The data small capacity according to (1) or (12), Method.

(14) The method according to (13), wherein the selected mathematical formula is characterized in that a difference between the selected mathematical formula and the patterned data that is to be generated is used as a parameter when converting a numerical sequence into a mathematical formula. Data reduction method.

(15) Starting predetermined source data from a predetermined data length, expanding the data until data no longer matches, extracting the longest identical portion, and extracting data excluding the extracted longest identical portion from the predetermined data length. Means for repeatedly extracting the next longest identical portion by expanding the data until the data no longer matches, and converting the extracted identical portion and the remaining data into a sequence of predetermined lengths Means for comparing the converted sequence with patterned data that will generate the converted sequence; and means for converting the compared difference into a predetermined mathematical expression using the compared difference as a parameter. Generation system.

(16) In the search for extracting the same part, the collation is performed from the forward direction and the reverse direction at the time of the same collation of the target data, and the reduced-volume data generation according to (15) is characterized. system.

(17) In the search for extracting the same portion, the data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting the data blocks in an arbitrary order. A small-volume data generation system according to (15).

(18) The search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and matching is performed on data obtained by connecting any number of the data blocks. (15) A small-capacity data generation system according to (5). (19) In the search for extracting the same portion, the data to be searched is divided into data blocks of an arbitrary size, and the priority of the data block having a higher number of reference appearances is set higher than that of the data block. The small-capacity data generation system according to (15), characterized in that:

(20) In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and appearance frequency of data blocks of the same data in the data blocks are obtained. The small-volume data generation system according to (15), wherein priorities are set.

(21) The method according to (15), wherein after extracting the data block having the same portion, common data included in the data block is extracted and the number of common data is measured. Small-capacity data generation system.

(22) After extracting the data block having the same portion, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. 15 A small-capacity data generation system according to 5).

(23) After extracting the data blocks having the same part, priorities are set in the order of the frequency of use of the same or similar data blocks among the extracted data blocks. (15) A small-capacity data generation system described in (5).

(24) After extracting the data blocks having the same part, the priority set according to the use frequency of the same or similar data blocks among the extracted data blocks is corrected. A small-volume data generation system according to (15).

(25) The reduced-volume data generation system according to (15), wherein the sequence of the predetermined length has a length that can be easily applied to the mathematical formula.

(26) The reduced-volume data generation system according to (15), wherein the patterned data that is likely to be generated is generated based on a change in the sequence. (27) The small mathematical expression according to (15) or (26), wherein the predetermined mathematical formula is compared with pattern data prepared in advance, and selects a familiar mathematical formula. Data generation system.

(28) The difference according to (27), wherein the difference between the selected mathematical expression and the patterned data that is likely to occur is used as a parameter when converting the mathematical expression into a sequence. Capacity data generation system.

(29) Starting predetermined source data from a predetermined data length, decompressing the data until the data no longer matches, extracting the longest identical portion, and excluding the extracted longest identical portion from the predetermined data length. It repeats the process of expanding the data until it no longer matches and extracting the next longest identical part, converting the extracted identical part and the remaining data into a sequence of a predetermined length, and generating the converted sequence. The converted data is converted into a predetermined mathematical expression based on the patternized data, and the converted data is compressed according to a predetermined method. A data compression method that compresses data.

(30) In the search for extracting the same part, matching is performed from the forward direction and the reverse direction at the time of the same matching of the target data, and the data reduction method according to (29) is used. A data compression method for compressing small-volume data.

(31) The search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in an arbitrary order and collated with data. A data compression method for compressing data that has been reduced by the data reduction method described in (29).

(32) The search for extracting the same part is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and matching is performed on data obtained by connecting the data blocks in an arbitrary number. A data compression method for compressing data that has been reduced in size by the data reduction method described in (29).

(33) In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the A data compression method for compressing data reduced in size by the data reduction method described in (29), wherein the priority is set higher.

(34) In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and the appearance frequency of the data blocks of the same data in the data blocks are determined. A data compression method for compressing data reduced in size by the data reduction method described in (29), wherein the priority is determined and the priority order is set.

(35) After extracting the data block having the same portion, the common data included in the data block is extracted and the number of the common data is measured. (29) A data compression method for compressing data that has been reduced by the described data reduction method.

(36) After extracting the data block having the same portion, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. A data compression method for compressing data that has been reduced according to the data reduction method described in (29).

(37) After extracting the data blocks having the same part, priorities are set in descending order of use frequency of the same or similar data blocks among the extracted data blocks. A data compression method for compressing data that has been reduced by the data reduction method described in (29).

(38) After extracting the data blocks having the same part, the priority set in accordance with the use frequency of the same or similar data blocks among the extracted data blocks is corrected. A data compression method for compressing data that has been reduced in volume by the data reduction method described in (29).

(39) The sequence of a predetermined length has a length that can be easily applied to the mathematical formula, and compresses data reduced in size by the data reduction method according to (29). Data compression method.

(40) The graphing is generated based on the variation of the sequence, and the data compression method for compressing the data reduced in size by the data reduction method according to (29). Law.

(41) The predetermined mathematical formula is compared with a graph prepared in advance, and a familiar mathematical formula is selected, and the predetermined mathematical formula is reduced by the data reduction method according to (29) or (40). A data compression method that compresses the volumeized data.

(42) The difference between the selected mathematical expression and the graph being drawn is used as a parameter when converting the mathematical expression into a sequence, and the data is reduced by the data reduction method according to (41). A data compression method that compresses data.

(43) Predetermined source data is compressed by a predetermined method, the compressed data is expanded from a predetermined data length until the data no longer matches, and the longest identical part is extracted, and the extracted longest identical part is extracted. Starting with the data length to be excluded, the data length is expanded until the data no longer matches, and the next longest identical part is repeatedly extracted.The extracted identical part and the remaining data are composed of the predetermined length. Converting the converted data to a predetermined mathematical expression using the converted data as a parameter, converting the converted data into a predetermined mathematical expression. Method to reduce data capacity.

(44) In the search for extracting the same part, matching is performed from the forward direction and the backward direction at the time of the same matching of the subject data, and the compressed data described in (43) is reduced in capacity. To reduce data capacity.

(45) The search for extracting the same part is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in any order and collated. (43) A data reduction method for reducing the compressed data volume described in (43).

(46) In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting the data blocks in an arbitrary number. (4) A data volume reduction method for reducing the volume of compressed data according to (43).

(47) In the search for extracting the same part, the data to be searched is The data block is divided into data blocks, and priority is given to a data block having a large number of reference appearances, wherein the compressed data described in (43) is reduced in data volume. Method.

(48) In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and the appearance frequency of the data blocks of the same data in the data blocks are obtained. The data reduction method according to (43), wherein the compressed data is reduced in size.

(49) The method according to (43), wherein after extracting the data block having the same portion, common data included in the data block is extracted and the number of common data is measured. A data reduction method for reducing the volume of compressed data.

(50) After extracting the data block having the same portion, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. 43) A data reduction method for reducing the amount of compressed data described in (4).

(51) After extracting the data blocks having the same part, priorities are set in descending order of use frequency of the same or similar data blocks among the extracted data blocks. (43) A data reduction method for reducing compressed data as described in (43).

(52) After extracting the data blocks having the same part, the priority set in accordance with the frequency of use of the same or similar data blocks among the extracted data blocks is corrected. A data reduction method according to (43), wherein the compressed data is reduced.

(53) The data reduction method according to (43), wherein the sequence of the predetermined length has a length that can be easily applied to the mathematical formula.

(54) The data reduction method according to (43), wherein the graphing is generated based on the variation of the sequence.

(55) The predetermined mathematical formula is a pattern data which is prepared in advance and will be generated. The method for reducing data capacity according to (43) or (54), characterized by selecting a formula that is more familiar than the data.

(56) The method according to (55), wherein the selected mathematical expression is used as a parameter when converting the mathematical expression into a sequence by using a difference from the patterned data that will occur. Data reduction method for reducing the volume of compressed data.

(57) Starting predetermined source data from a predetermined data length, expanding the data until data no longer matches, extracting the longest identical portion, and extracting data excluding the extracted longest identical portion from the predetermined data length. Means for repeatedly decompressing data until the data no longer matches and extracting the next longest identical part, and converting the extracted identical part and the remaining data into a sequence of predetermined lengths Means, means for comparing the converted sequence with patterned data which will generate, means for converting the compared difference into a predetermined mathematical expression as a parameter, and small capacity converted into the predetermined mathematical expression Transmission means for transmitting the coded data to a desired destination.

(58) The small-capacity data transmission system according to (57), wherein the transmission means is based on radio waves and / or light and / or electrical communication.

(59) Starting predetermined source data from a predetermined data length, expanding the data until data no longer matches, extracting the longest identical portion, and extracting data excluding the extracted longest identical portion from the predetermined data length. Means for repeatedly decompressing data until the data no longer matches and extracting the next longest identical part, and converting the extracted identical part and the remaining data into a sequence of predetermined lengths Means, means for comparing the converted sequence with patterned data which will generate, means for converting the compared difference into a predetermined mathematical expression as a parameter, and small capacity converted into the predetermined mathematical expression And a recording means for recording the converted data on a recording medium.

(60) The storage medium according to (59), wherein the recording medium is a movable disk-shaped medium and / or a tape-shaped medium and / or a storage medium configured with an IC memory. Recording system.

(61) Starting from the specified source data with the specified data length until the data no longer matches Means for extracting the longest identical part by expanding the data, and extracting the next longest identical part by expanding the data starting from a predetermined data length, excluding the extracted longest identical part, until the data no longer matches. Means for repeatedly performing the same operation, means for converting the extracted identical part and the remaining data into a sequence having a predetermined length, and comparing the converted sequence with the pattern data which will generate the sequence. Means for converting the compared difference into a predetermined mathematical expression as a parameter; and recording the reduced capacity data converted into the predetermined mathematical expression on a predetermined recording medium and reducing the recorded reduced capacity. A small-capacity data reproduction system comprising reproduction means capable of reproducing data appropriately.

(62) The recording medium described in (61), wherein the recording medium is a movable disk-shaped medium and / or a tape-shaped medium and / or a storage medium configured with an IC memory. system.

(63) Means for extracting the longest identical portion by extending the predetermined source data from the predetermined byte length until the data no longer matches, and extracting the data excluding the extracted longest identical portion to the predetermined data length Means to repeatedly expand data until the data no longer matches and extract the next longest identical part, and convert the extracted identical part and the remaining data into a sequence of predetermined length Means for performing the conversion, the means for comparing the converted sequence with the patterned data that will be generated, the means for converting the compared difference into a predetermined mathematical expression as a parameter, and the small value converted to the predetermined mathematical expression. A small-capacity data reproduction system comprising reproduction means capable of appropriately reproducing the large-capacity data. In this way, the same part of the data is extracted and the same part is entangled as one identical data, and the entangled data is further converted into a predetermined sequence and compared with the patterned data which may be generated further. Then, by using the difference as a parameter as an equation, the amount of data to be transmitted can be reduced and the meaning of the data included in the transmitted data can be extremely large data. It is possible to realize more efficient data transmission than data to be transmitted. BRIEF DESCRIPTION OF THE FIGURES FIG. 1 is an overall flow chart for realizing the data reduction method of the present invention.

FIG. 2 is an overall block diagram for realizing the data reduction method. FIG. 3 is a block diagram showing a block generation / rule generation unit and a coding calculation unit that constitute the capacity reduction method.

FIG. 4 is an explanatory diagram showing a method of searching for the same block.

FIG. 5 is an explanatory diagram showing a method for searching for an approximate block.

Fig. 6 is a conceptual diagram of usage frequency measurement.

FIG. 7 is a conceptual diagram for detecting unique data.

FIG. 8 is a conceptual diagram for generating reduced-volume data using mathematical formulas. BEST MODE FOR CARRYING OUT THE INVENTION

Next, a data compression method for compressing data that has been reduced by the data reduction method, the data reduction system, and the data reduction method according to the present invention, and reducing the volume of the compressed data. An embodiment of a data reduction method, a data transmission system and a data recording system will be described with reference to the drawings.

As shown in FIG. 1, a system for reducing data capacity according to the present invention includes, as shown in FIG. 1, source data 100 such as video data, music data, etc. The unique conversion data 210 is created by performing the conversion process, the unique conversion data 210 is converted into a predetermined mathematical expression to generate the mathematical expression conversion data 220, and the data amount is reduced for transmission. This is to create the reduced capacity data 300. The unique conversion data 210 and the mathematical expression conversion data 220 constitute a recording calculation unit 200. In addition, although not shown, at the time of transmission, the created small-capacity data 300 is compressed and transmitted by a well-known compression method, or the source data 100 is compressed. Then, the above-mentioned reduced-capacity data 300 is transmitted, or the source data is compressed into reduced-capacity data and then compressed again by a compression method to create reduced-capacity data. Also, this small-capacity data 300 is transmitted to a desired destination by radio wave, light, or electric communication, or CD-RO It is configured to be recordable on a recording medium composed of a disk-shaped medium such as M, a tape-shaped medium such as DAT or DDS, or a recording medium composed of an IC memory. The recording medium is not limited to these, and includes, for example, using optical, magnetic, physical and chemical recording media. Further, it is needless to say that the recorded small-volume data can be reproduced and used.

As shown in FIG. 2, the data reduction system having such a feature includes a unique conversion data generation unit 210 that converts source data 100 into a predetermined sequence to generate unique conversion data, Formula conversion data generation unit 220 that converts the unique conversion data into meaningful data and converts it into mathematical formulas to generate formula conversion data, unique conversion data generation unit 210, and formula conversion data generation The section 220 is composed of a block generation and rule generation section 400 for extracting the characteristics of the data itself and performing block processing and rule processing to reduce the data amount.

As shown in FIG. 3, the block generation and rule generation unit 400 includes a common part extraction unit 410 for extracting a common part of data, a difference extraction unit 420, and a coding calculation unit 2 It consists of 0 0.

The common part extraction unit 4110 is divided into the same part extraction unit 411 that extracts the same part of the data, the approximate part extraction unit 412 that extracts the approximate part of the data, and the priority based on the frequency of use of the same part. And a data weighting unit 4 13 for weighting the data by adding a.

The same part extraction unit 4111 includes a longest same part extraction logic 4 14 and a different direction same part extraction logic 4 15. The longest identical part extraction logic 4 1 4 is to extract the longest identical part of the data, starting from a predetermined data length, decompressing the data to be searched until it no longer matches, and extracting the longest Extract the same part. In other words, the largest identical part can be extracted, and the more identical parts, the smaller the amount of data to be sent. The different direction same part extraction logic 4 15 searches from the reverse direction and extracts the same part.When searching in the forward direction, the data is compared not only in the forward direction but also in the reverse direction when matching the data. The collation is performed. For example, in Figure 4 As shown, when the block “ABCDJ 20 is present, the block that matches in the forward direction is“ ABCD ”10 and the block that matches in the backward direction is“ DCBAJ 11 ”, both of which are the same block. It can be processed as.

As shown in FIG. 3, the approximated part extraction unit 412 extracts a common block including a dissimilar part and includes an out-of-order data detection logic 416 and a different data length detection logic 41 7 It is composed of

The out-of-order data detection logic 416 detects a portion that uses the same data column in a different order, divides the data to be searched into data blocks of an arbitrary size, and divides the divided data blocks into arbitrary data blocks. The collation is performed on the data concatenated in order.

The different data length detection logic 417 detects a portion including the same data string and having a different total data length, divides the search target data into data blocks of an arbitrary size, and divides the divided data blocks. Performs collation on data linked in any number. Fig. 5 is a conceptual diagram of the approximate partial extraction. First, when there is a data block called "EFGH I ABC" 41, the approximate ones are "ABDEFGH I" 40 and "AABCDDDEFGH IJ 42". The common part of these is “EFGH l” 50 is the common block. By gathering approximate blocks having this common part, it is possible to set some rules there. By gathering approximate blocks that combine blocks of dissimilar parts centering on partially common blocks in this way, a predetermined method can be selected from the aggregate of approximate blocks without being limited to the processing of the same block. A predetermined rule can be applied based on statistics and the like, and it becomes possible to reduce the capacity of data overnight.

As shown in FIG. 3, the data weighting section 413 is composed of a use frequency measurement logic 418 for determining the priority order based on the frequency of the common block and a unique data detection logic 419 for processing data having no common block. Have been.

The usage frequency measurement logic 418 determines the priority based on the usage frequency, the reference block content, and the like, and divides the data to be searched into data blocks of an arbitrary size. Then, the number of appearances of the divided data blocks is counted. The unique data detection logic 4 "I 9 measures the similarity of data and determines the priority, and divides the data to be searched into data blocks of arbitrary size. The appearance of the divided data blocks Determine the order and frequency of use.

FIG. 6 is a conceptual diagram of the usage frequency measurement logic. For example, “AJ data block is referred to by“ A 1 J, ”“ A 2 ”in source data 100, and“ BJ data block is represented by Γ Γ If only “1” is referenced, “Α” is referenced from two places, so the priority is higher than “BJ one place”.

Fig. 7 is a conceptual diagram of the unique data detection logic, which includes data blocks `` HI '' and `` ys '' that are common to source data 100, and therefore has low priority and does not include common blocks. Has a lower priority.

As shown in FIG. 3, the difference extraction unit 420 includes a common part comparison unit 421 that compares the data of the blocks having the common part with each other, and a non-similar part of the block having the common part. It is composed of a difference comparison unit 4 22 which compares the differences.

As shown in FIG. 3, the common part comparing section 421 compares the common parts to extract the difference, and divides and compares the dissimilar parts of the common block. And a repetitive block detection logic 4 2 4 for detecting repetition of the common block. The division / comparison logic 4 2 3 divides the data into arbitrary data lengths and compares them with the blocks of the extracted common part, and divides the data to be searched into data blocks of arbitrary size. Then, the number included in the common part is measured.

The repetitive block detection logic 4 2 4 divides the data into arbitrary data lengths and detects continuity, and divides the data to be searched into data blocks of arbitrary size. Then, a continuous part in a specific common part is detected.

The difference comparison unit 422 calculates the similarity detection logic 425 that detects the similarity of the dissimilar part in the block including the common part and the content ratio of the common part that includes the dissimilar part with respect to the common part. It consists of common part content ratio measurement logic 4 26 for measurement. The similarity detection logic 425 determines the priority based on the similarity between the difference data, and measures the similarity in the same manner as the common part to determine the priority. Common part included The ratio measurement logic 426 corrects the priority based on the ratio included in the common part, measures the frequency of use of the common part, and corrects the priority.

The coding calculation unit 200 includes a unique conversion data generation unit 210 that uniquely converts the data that has been blocked and ruled by the block generation and rule generation unit 400 described above, and a predetermined mathematical expression. And a mathematical expression conversion data generation unit 220 for conversion. The unique conversion data generation unit 210 converts the common part of the coded and ruled data into a sequence of arbitrary length. Convert to

The mathematical expression conversion data generation unit 220 generates predetermined data by applying a predetermined mathematical expression to the data composed of the sequence created by the unique conversion data generation unit 210 described above. Based on the fluctuation of the sequence generated by the unique conversion data generation unit 210, a predetermined sequence of appearing and disappearing trends is detected, compared with the sequence data of the tendency prepared in advance, and the most common formula is selected. I do. Then, the difference from the pattern drawn by the selected formula is converted into a formula as a parameter.

FIG. 8 is a conceptual diagram for generating reduced-volume data by using a mathematical formula, wherein “A BD 8 W

E 8 9 F A J source data 100 is uniquely converted to “3 2 4 9 8

6 4 5 3 0 1 7 8-■ ■, ”, a sequence 10 OA is generated, and this sequence“! 0 OA is compared with the trend sequence data in advance, and the difference in the pattern of the selected formula is used as a parameter. , Γ 7/9 + 1 1/3 * 5/13 ”.

In this way, the source data 100 is uniquely converted into a predetermined sequence to make it meaningful data, and at the same time, the statistical properties of the data are grasped by blocking and making rules, and the data By grasping patterns and formulating the grasped patterns as parameters to generate reduced-volume data, all source data can be used as common blocks, trend parameters, and rule-based data. This allows the data to be reduced without truncating the data that makes up the source data, so that the restoration can be completely restored. Industrial applicability

As described above, the predetermined data according to the present invention is expanded from the predetermined byte length until the data no longer matches, and the longest identical portion is extracted, and the extracted longest identical portion is removed. Repeat the process of extracting the next longest identical part by expanding the data starting from the specified byte length until the data no longer matches, and converting the extracted identical part and the remaining data into a sequence of predetermined length By converting the converted sequence into a mathematical expression by using as a parameter the patternized data that would be generated in advance, the data to be transmitted can be converted into predetermined meaningful data. This has the effect of improving the transmission efficiency by reducing the amount of data that is actually transmitted in entanglement.

Claims

The scope of the claims

1. Determining the predetermined source data starting from the predetermined data length and extracting the longest identical portion by extending the data until the data no longer matches, and extracting the data excluding the extracted longest identical portion from the predetermined data length starting with the predetermined data length. It repeats the process of extracting the next longest identical part by expanding the data until it no longer matches, converting the extracted identical part and remaining data into a sequence of a predetermined length, and generating the converted sequence Data reduction, wherein the difference is compared with the patterned data which is likely to be used as a parameter, and the difference is converted into a predetermined mathematical expression based on the parameter to generate and output the reduced data. Method.

2. The data reduction method according to claim 1, wherein, in the search for extracting the same part, matching is performed from the forward direction and the backward direction at the time of the same matching of the subject data. .

3. The search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated with data connected in an arbitrary order. 2. The data reduction method according to claim 1, wherein

4. The search for extracting the same portion is characterized in that data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting any number of the data blocks. 2. The data reduction method according to claim 1.

5. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and a priority is given to a data block having a large number of reference appearances. 2. The data reduction method according to claim 1, wherein

6. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and the appearance frequency of the data blocks of the same data among the data blocks are determined to give priority. 2. The data reduction method according to claim 1, wherein the order is set.

7. After extracting the data block having the same portion, the data block 2. The data reduction method according to claim 1, wherein common data included in the data is extracted and the number of common data is measured.

8. After extracting the data block having the same portion, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. 2. The method for reducing data capacity according to item 1, wherein

9. After extracting the data blocks having the same part, priorities are set in descending order of use frequency of the same or similar data blocks among the extracted data blocks. The method for reducing data capacity according to claim 1.

10. After the extraction of the data blocks having the same part, the priority set according to the frequency of use of the same or similar data blocks among the extracted data blocks is corrected. 2. The method for reducing data volume according to claim 1, wherein:

11. The data reduction method according to claim 1, wherein the sequence of the predetermined length has a length that can be easily applied to the mathematical formula.

12. The method for reducing data capacity according to claim 1, wherein the graphing is generated based on a change in the sequence.

13. The data according to claim 1 or 12, wherein the predetermined mathematical formula is compared with data prepared from a pattern prepared in advance, and selects a familiar mathematical formula. Data reduction method.

14. The method according to claim 13, wherein the selected mathematical formula is used as a parameter for converting a difference from patterned data that is likely to be generated into a mathematical formula from a sequence. Data reduction method.

15. Means for extracting predetermined source data from a predetermined byte length and extracting the longest identical portion by expanding the data until the data no longer matches, and extracting data excluding the extracted longest identical portion from the predetermined data length. A means for repeatedly expanding the data until the data no longer matches for the first time and extracting the next longest identical part; Means for converting the data into a sequence of predetermined lengths; means for comparing the converted sequence with the patterned data that would be generated; and converting the compared differences into parameters as predetermined parameters. And a means for performing data reduction.

16. The reduced-volume data according to claim 15, wherein, in the search for extracting the same part, matching is performed from the forward direction and the backward direction at the same matching of the subject data. Generation system.

17. The search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of any size, and the data blocks are collated in any order and collated. The reduced-volume data generation system according to claim 15, wherein:

18. The search for extracting the same portion is characterized in that data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting any number of the data blocks. The reduced-volume data generation system according to claim 15, wherein:

19. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the data blocks having a higher number of reference appearances are given higher priority. The reduced-volume data generation system according to claim 15, wherein the data generation system is configured to perform the above-described processing.

20. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and the appearance of data blocks of the same data among the data blocks are determined. 16. The small-capacity data generation system according to claim 15, wherein priorities are set based on appearance frequencies.

21. The method according to claim 15, wherein after extracting the data block having the same portion, the common data included in the data block is extracted and the number of the common data is measured. A small-volume data generation system according to the above description.

22. After extracting the data block having the same portion, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. The reduced-volume data generation system according to claim 15.

23. After extracting the data blocks having the same part, priorities are set in descending order of use frequency of the same or similar data blocks among the extracted data blocks. A reduced-capacity data generation system according to claim 15, characterized in that:

2 4. After extracting the data blocks having the same part, correcting the priority set according to the frequency of use of the same or similar data blocks among the extracted data blocks. The reduced-volume data generation system according to claim 15, characterized in that:

25. The small-capacity data generation system according to claim 15, wherein the predetermined-length sequence is a length that is easily applied to the mathematical formula.

26. The miniaturized data generation system according to claim 15, wherein the patternized data that is to be generated is generated based on the variation of the sequence.

27. The predetermined mathematical formula is compared with pattern data prepared in advance, and selects a familiar mathematical formula, wherein the mathematical formula is selected. Small volume data generation system.

28. The method according to claim 27, wherein the difference between the selected mathematical expression and the patterned data that will occur is used as a parameter when converting the mathematical expression into a sequence. Small-capacity data generation system.

2 9. Starting with the specified source data, expanding the data until the data no longer matches, extracting the longest identical part, and removing the extracted longest identical part <Data starts with the predetermined data length Is repeated until data no longer matches, and the next longest identical part is repeatedly extracted.The extracted identical part and the remaining data are converted into a sequence having a predetermined length, and the converted sequence is generated. The data is converted into a predetermined mathematical expression based on the patternized data that is likely to be converted, and the converted data is compressed by a predetermined method. A data compression method that compresses compressed data.

3 0. In the search for extracting the same part, 29. A data compression method for compressing data reduced in size by the data reduction method according to claim 29, wherein the comparison is performed also in the direction and the reverse direction.

31. The search for extracting the same part is characterized in that data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting the data blocks in an arbitrary order. A data compression method for compressing data reduced in size by the data reduction method according to claim 29.

32. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by connecting any number of the data blocks. A data compression method for compressing data that has been reduced by the data reduction method described in Item 29.

33. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and a priority is given to a data block having a large number of reference appearances. 30. A data compression method for compressing data reduced in size by the data reduction method according to claim 29.

34. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and appearance frequency of data blocks of the same data among the data blocks are obtained. 31. A data compression method for compressing data reduced in size by the data reduction method according to claim 29, wherein priority is set by setting.

35. The method according to claim 29, wherein after extracting the data block having the same portion, the common data included in the removable block is extracted and the number thereof is measured. A data compression method for compressing data that has been reduced by the data reduction method described in (1).

36. After extracting the data block having the same portion, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. A data compression method for compressing data reduced in size by the data reduction method according to claim 29.

37. After extracting the data blocks having the same part, priorities are set in descending order of use frequency of the same or similar data blocks among the extracted data blocks. A data compression method for compressing data reduced in size by the data reduction method according to claim 29.

38. After extracting the data blocks having the same part, the priority set in accordance with the frequency of use of the same or similar data blocks among the extracted data blocks is corrected. 30. A data compression method for compressing data which has been reduced by the data reduction method according to claim 29.

39. The data having the predetermined length, the length of which is easily applicable to the mathematical formula, wherein the data reduced in size by the data reduction method according to claim 29. A data compression method that compresses data.

40. The data compression method for compressing data reduced by the data reduction method according to claim 29, wherein the graphing is generated based on the variation of the sequence. .

41. The small data volume according to claim 29 or 40, wherein the predetermined mathematical formula is selected from a more familiar mathematical formula in comparison with a graph prepared in advance. A data compression method that compresses data that has been reduced in size by a data compression method.

42. The method for reducing data capacity according to claim 41, wherein the selected mathematical expression uses a difference from a drawn graph as a parameter when converting the mathematical expression into a sequence. A data compression method for compressing small-volume data.

4 3. Predetermined source data is compressed by a predetermined method, the compressed data is expanded from a predetermined data length until the data no longer matches, the longest identical part is extracted, and the extracted longest identical part is extracted. The data to be removed is started from a predetermined data length, the data is repeatedly expanded until the data no longer matches, and the next longest identical part is repeatedly extracted.The same part and the remaining data are extracted from the predetermined length. Converting the converted sequence into a predetermined mathematical expression by using the converted sequence as a parameter, and comparing the converted sequence with patternized data that will be generated. Data volume Quantification method.

44. In the search for extracting the same part, matching is performed from the forward direction and the backward direction at the time of the same matching of the target data, wherein the compressed data according to claim 43 is obtained. Data reduction method to reduce the data volume.

45. The search for extracting the same portion is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and the data blocks are collated in any order and collated. The method for reducing data capacity according to claim 43, wherein the capacity of the compressed data is reduced.

46. The search for extracting the same part is characterized in that the data to be searched is divided into data blocks of an arbitrary size, and collation is performed on data obtained by concatenating an arbitrary number of the data blocks. The method for reducing data capacity according to claim 43, wherein said compressed data is reduced in capacity.

47. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the data blocks having a higher number of reference appearances are given higher priority. The data reduction method according to claim 43, wherein the compressed data is reduced in volume.

48. In the search for extracting the same part, the data to be searched is divided into data blocks of an arbitrary size, and the appearance order and the appearance frequency of data blocks of the same data among the data blocks are obtained. 44. The data reduction method according to claim 43, wherein the priority is set by setting the compression order.

49. The method according to claim 43, wherein after extracting the data block having the same portion, the common data included in the deblock is extracted and the number thereof is measured. Data reduction method for reducing the volume of compressed data described in.

50. After extracting the data block having the same portion, a repeated or continuous portion of the same or similar data block is detected from among the extracted data blocks. Compressed data according to claim 43 A method for reducing the data capacity to reduce the capacity

51. After extracting the data blocks having the same part, priorities are set in descending order of the frequency of use of the same or similar data blocks among the extracted data blocks. A data reduction method according to claim 43, wherein the compressed data is reduced in volume.

52. After extracting the data blocks having the same part, correction of the priority set according to the frequency of use of the same or similar data blocks among the extracted data blocks is performed. 4. The data reduction method according to claim 43, wherein the compressed data is reduced.

53. The method for reducing data capacity according to claim 43, wherein the numerical sequence having the predetermined length has a length that can be easily applied to the mathematical formula.

54. The data reduction method according to claim 43, wherein the graphing is generated based on the variation of the sequence.

55. The predetermined mathematical formula is selected as compared with pattern data which is likely to be prepared in advance, and selects a familiar mathematical formula. Data reduction method for reducing the volume of the compressed data described in the section.

56. The method according to claim 55, wherein the selected mathematical expression is used as a parameter when converting the mathematical expression into a numerical sequence by using a difference from the patterned data that will occur. A data reduction method for reducing the volume of the described compressed data.

5 7. Starting the specified source data from the specified byte length, expanding the data until the data matches, and extracting the longest identical part, and extracting the data excluding the extracted longest identical part. Means for repeatedly expanding data and extracting the next longest identical part until the data does not match, starting from the data length, and extracting the extracted identical part and remaining data from a predetermined length. Means for converting the converted sequence into pattern data which will generate the converted sequence; means for converting the compared difference into a predetermined mathematical expression as a parameter; A small-capacity data transmission system comprising transmission means for transmitting the converted small-capacity data to a desired destination.

58. The small-capacity data transmission system according to claim 57, wherein said transmission means is based on radio waves and / or light and / or electrical communication.

5 9. Means for extracting the longest identical part by extending the predetermined source data from the predetermined byte length until the data does not match, and extracting the data excluding the extracted longest identical part to the predetermined data length Means for repeatedly extracting the next longest identical part by expanding the data until the data no longer matches, and converting the extracted identical part and remaining data into a sequence of predetermined length Means, means for comparing the converted sequence with patterned data that will generate the converted sequence, means for converting the compared difference into a predetermined mathematical expression as a parameter, and small capacity converted into the predetermined mathematical expression Recording means for recording the converted data on a recording medium.

60. The small-capacity storage medium according to claim 59, wherein the recording medium is a movable disk-shaped medium and a storage medium formed of a Z or tape-shaped medium and / or an IC memory. Data recording system.

6 1. Means for extracting the longest identical part by extending the predetermined source data from the predetermined byte length until the data no longer matches, and extracting the data excluding the extracted longest identical part to the predetermined data length. Means for repeatedly extracting the next longest identical part by expanding the data until the data no longer matches, and means for converting the extracted identical part and the remaining data into a sequence of predetermined length Means for comparing the converted sequence with patterned data that will generate the converted sequence; means for converting the compared difference into a predetermined mathematical expression as a parameter; and reducing the capacity converted to the predetermined mathematical expression. Reproducing means for recording data on a predetermined recording medium and appropriately reproducing the recorded small-volume data. Tem.

62. The small-capacity storage medium according to claim 61, wherein the recording medium is a movable disk-shaped medium and a storage medium configured of a Z or tape-shaped medium and / or an IC memory. Data reproduction system.

6 3. Means for extracting the longest identical part by extending the predetermined source data starting from a predetermined byte length until the data does not match, and extracting the longest identical part. Means for repeatedly extracting data starting from a predetermined data length and extracting the next longest identical part until the data no longer matches, and extracting the extracted identical part and remaining data from a predetermined length. Means for converting the converted sequence into pattern data which will generate the converted sequence; means for converting the compared difference into a predetermined mathematical expression as a parameter; A small-capacity data reproducing system comprising reproducing means capable of appropriately reproducing the converted small-volume data.