CN104579360B - A kind of method and apparatus of data processing - Google Patents
A kind of method and apparatus of data processing Download PDFInfo
- Publication number
- CN104579360B CN104579360B CN201510059809.6A CN201510059809A CN104579360B CN 104579360 B CN104579360 B CN 104579360B CN 201510059809 A CN201510059809 A CN 201510059809A CN 104579360 B CN104579360 B CN 104579360B
- Authority
- CN
- China
- Prior art keywords
- data
- code value
- section
- coding
- character string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4006—Conversion to or from arithmetic code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/06—Arrangements for sorting, selecting, merging, or comparing data on individual record carriers
- G06F7/08—Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6064—Selection of Compressor
- H03M7/6082—Selection strategies
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Automation & Control Theory (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
An embodiment of the present invention provides a kind of method and apparatus of data processing, this method includes being encoded to data using Arithmetic Coding algorithm, obtains code value section;When there is code value corresponding with data in code value section, code value is obtained according to code value section;By being compared for the bit number of code value and the bit number of data, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce the memory space of data.
Description
Technical field
The present invention relates to data processing field, more particularly to a kind of method and apparatus of data processing.
Background technology
Arithmetic coding is very useful lossless compression algorithm another in recent years.The core concept of algorithm coding be by
All symbols that are encoded being likely to occur are mapped to an integer set in source data, and assign each coded identification and go out accordingly
Existing probability (there is a requirement that probability that all characters occur and be 1).According to the probability of occurrence of each character, each character occupy [0,
1] continuum of a half-open semi-closure in section, siding-to-siding block length value, that is, probability value, while being mutually independent between section.
Required coded string is mapped to an integer sequence then according to mapping table, then according in the source data of required coding
It is encoded the probability of symbol appearance, source data is gradually transformed into a real number interval for corresponding to [0,1] section, the areas Bing Qugai
Between in a real number as code value store in a computer.The section encoded next time is last time to encode obtained area
Between, and the probability ratio that all symbols occur every time is constant.The binary code value is restored according to inversionization when decoding
At corresponding integer sequence, original character string is then re-mapped back.For example, for set of integers space { 0,1,2,3 }, occur general
Rate is distributed as { 0.2,0.5,0.2,0.1 }.So corresponding to list entries is<210013>Data, coding section it is as follows successively:
[0.7,0.9], [0.74,0.84], [0.74,0.76], [0.74,0.744], [0.7408,0.7428], [0.7426,
0.7428], the final corresponding code value section of the data is [0.7426,0.7428] (corresponding coding of last character sequence
Section), the code value of the data is a numerical value in [0.7426,0.7428].
Do not consider whether compression has income for data to be encoded, in existing arithmetic coding, directly data are pressed
Contracting, then stores the code value after corresponding arithmetic coding, since the corresponding code value bit number of certain data is larger, existing
Technology will increase the memory space of data.
Invention content
The embodiment of the present invention provides a kind of method and apparatus of data processing, and the storage that this method can reduce data is empty
Between.
In a first aspect, a kind of method of data processing is provided, including:Data are compiled using Arithmetic Coding algorithm
Code, obtains code value section;When there is code value corresponding with the data in the code value section, which is obtained according to the code value section
Value;By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;According to the comparison result into
Row storage operation.
With reference to first aspect, in the first possible implementation, which is that the bit number of the code value is less than
The bit number of the data, wherein this carries out storage operation according to the comparison result, including:According to the comparison result, the code is stored
Value.
With reference to first aspect, in second of possible realization method, which is that the bit number of the code value is more than
Or the bit number equal to the data, wherein this carries out storage operation according to the comparison result, including:According to the comparison result, deposit
Store up the data.
Further include in the third possible realization method in conjunction with the first possible realization method:According to the code value into
The application operating of the row data, the application operating include it is equivalent relatively, at least one of arrangement and fuzzy query.
In conjunction with the third possible realization method, in the 4th kind of possible realization method, which is mark ID class words
Symbol string, the application operating include that equivalence compares, this carries out the application operating of the data according to the code value, including:The code value with
When code value to be compared is equal, determine that data data corresponding with the code value to be compared are identical data.
In conjunction with the third possible realization method, in the 5th kind of possible realization method, which is ID class character strings
Or the alphabetic character string for field, the application operating include arrangement, this carries out the application operating of the data according to the code value, packet
It includes:According to the size of the code value, position of the code value in code value to be arranged is determined, the position of the code value is for indicating the data
Position in data corresponding with the code value to be arranged.
In conjunction with the third possible realization method, in the 6th kind of possible realization method, which includes fuzzy
Inquiry, this carries out the application operating of the data according to the code value, including:According to the code value whether required fuzzy query prefix
In the coding section of character string, determine whether the data include the prefix character string, wherein in the code value in required fuzzy query
Prefix character string coding section in when, which includes the prefix character string, in the code value not in required fuzzy query
When in the coding section of prefix character string, which does not include the prefix character string.
With reference to first aspect, the possible realization method of any one of first to the 6th kind of possible realization method,
In seven kinds of possible realization methods, this encodes data using Arithmetic Coding algorithm, obtains code value section, including:Using
Arithmetic Coding algorithm encodes the data, obtains coding section;Re-spread exhibition is carried out to the coding section of the data, obtains weight
Coding section after extension;According to the coding section after the re-spread exhibition, continue to compile the data using Arithmetic Coding algorithm
Code, obtains code value section.
In conjunction with the 7th kind of possible realization method, in the 8th kind of possible realization method, the code area to the data
Between carry out re-spread exhibition, obtain the coding section after re-spread exhibition, including:When meeting at least one of the following conditions, to the number
According to coding section carry out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:It is completed
The character number of the data of coding reaches character number threshold value and the length in the coding section of the data is less than interval threshold.
With reference to first aspect, the possible realization method of any one of first to the 8th kind of possible realization method,
In nine kinds of possible realization methods, when there is code value corresponding with the data in the code value section, obtained according to the code value section
Before taking the code value, further include:It determines and whether there is suitable code value corresponding with the data in the code value section.
Further include in the tenth kind of possible realization method in conjunction with the 9th kind of possible realization method:In the code value section
It is interior be not present suitable code value corresponding with the data when, store the data.
Second aspect provides a kind of method of data processing, including:Data are compiled using Arithmetic Coding algorithm
Code obtains coding section;Re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition;It is heavy according to this
Coding section after extension continues to encode the data, obtains code value section using Arithmetic Coding algorithm;According to the code value
Section obtains the code value;Store the code value.
In conjunction with second aspect, in the first possible implementation, which carries out re-spread exhibition,
The coding section after re-spread exhibition is obtained, including:When meeting at least one of the following conditions, to the coding sections of the data into
The re-spread exhibition of row obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character position:The data of coding are completed
Character number reaches character number threshold value and the length in the coding section of the data is less than interval threshold.
The third aspect provides a kind of method of data processing, including:Obtain the code value of data and re-spread exhibition character bit
It sets;The code value of data is decoded using Arithmetic Coding algorithm, obtains decoding section;It is right according to the re-spread exhibition character position
The decoding section of the data is scaled again, the decoding section scaled again;According to the decoding section of the heavy scaling, using calculation
Art encryption algorithm continues to be decoded the data, obtains the data.
In conjunction with the third aspect, in the first possible implementation, this is according to re-spread exhibition character position, to the data
Decoding section is scaled again, the decoding section scaled again, including:According to re-spread exhibition character position, determination scales word again
Position is accorded with, the wherein re-spread exhibition character position is mutually inverted with the heavy scale characters position;According to weight scale characters position to the number
According to decoding section scaled again, the decoding section scaled again.
Fourth aspect provides a kind of method of data processing, including:Data are compiled using Arithmetic Coding algorithm
Code, obtains code value section;The code value is obtained according to the code value section;Store the code value;Answering for the data is carried out according to the code value
With operation, which includes at least one of equivalent comparison, arrangement and fuzzy query.
In conjunction with fourth aspect, in the first possible implementation, which is ID class character strings, the application operating packet
Include it is equivalent relatively this carries out the application operating of the data according to the code value, be included in the code value it is equal with code value to be compared when,
Determine that data data corresponding with the code value to be compared are identical data.
In conjunction with fourth aspect, in second of possible realization method, the data be ID classes character string or be field word
Alphabetic character string, the application operating include arrangement, this carries out the application operating of the data according to the code value, including:According to the code value
Size, determine position of the code value in code value to be arranged, the position of the code value is for indicating that the data are waiting arranging with this
The corresponding data of code value in position.
In conjunction with fourth aspect, in the third possible realization method, which includes mould, this according to the code value into
The application operating of the row data, including:According to the code value whether in the coding section of the prefix character string of required fuzzy query,
Determine the data whether include the prefix character string, wherein the code value the prefix character string of required fuzzy query coding
When in section, the data include the prefix character string in the code value not in the coding section of the prefix character string of required fuzzy query
When middle, which does not include the prefix character string.
5th aspect, provides a kind of equipment of data processing, including:Coding unit, for using Arithmetic Coding algorithm
Data are encoded, code value section is obtained;Acquiring unit, for there is code value corresponding with the data in the code value section
When, which is obtained according to the code value section;Comparing unit, for by the bit number of the bit number of the code value and the data into
Row compares, to obtain comparison result;First storage unit, for carrying out storage operation according to the comparison result.
In conjunction with the 5th aspect, in the first possible implementation, which is that the bit number of the code value is less than
The bit number of the data, wherein first storage unit stores the code value according to the comparison result.
In conjunction with the 5th aspect, in second of possible realization method, which is that the bit number of the code value is more than
Or the bit number equal to the data, wherein first storage unit stores the data according to the comparison result.
Further include in the third possible realization method in conjunction with the first possible realization method in terms of the 5th:It answers
With unit, the application operating for carrying out the data according to the code value, which, which compares including equivalence, arranges and obscure, looks into
At least one of ask.
In conjunction with the third possible realization method of the 5th aspect, in the 4th kind of possible realization method, which is
ID class character strings, the application operating include that equivalence compares, which determines when the code value is equal with code value to be compared
Data data corresponding with the code value to be compared are identical data.
In conjunction with the third possible realization method of the 5th aspect, in the 6th kind of possible realization method, which is
ID classes character string or alphabetic character string for field, the application operating include arrangement, the applying unit according to the size of the code value,
Determine position of the code value in code value to be arranged, the position of the code value is for indicating the data in the code value pair to be arranged with this
Position in the data answered.
In conjunction with the third possible realization method of the 5th aspect, in the 8th kind of possible realization method, application behaviour
Work includes fuzzy query, and whether the applying unit is according to the code value in the coding section of the prefix character string of required fuzzy query
Determine the data whether include the prefix character string, wherein the code value the prefix character string of required fuzzy query coding
When in section, which includes the prefix character string, in the code value not in the code area of the prefix character string of required fuzzy query
Between it is middle when, the data include the prefix character string.
The possible realization of any one of first to the 9th kind of possible realization method in conjunction with the 5th aspect, the 5th aspect
Mode, in the tenth kind of possible realization method, which encodes the data using Arithmetic Coding algorithm, obtains
Encode section;Re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition;After the re-spread exhibition
Section is encoded, continues to encode the data using Arithmetic Coding algorithm, obtains code value section.
In conjunction with the tenth kind of possible realization method of the 5th aspect, in a kind of the tenth possible realization method, the coding
Unit carries out re-spread exhibition when meeting at least one of the following conditions, to the coding section of the data, after obtaining re-spread exhibition
Section is encoded, and records corresponding re-spread exhibition character position:The character number that the data of coding are completed reaches character number threshold
The length in the coding section of value and the data is less than interval threshold.
A kind of possible reality of any one of the first to the tenth possible realization method in conjunction with the 5th aspect, the 5th aspect
Now mode further includes in the 12nd kind of possible realization method:Determination unit, for obtaining the code value in the acquiring unit
Before, it determines and whether there is suitable code value corresponding with the data in the code value section.
It is also wrapped in the 13rd kind of possible realization method in conjunction with the 12nd kind of possible realization method of the 5th aspect
It includes:Second storage unit stores the number when for suitable code value corresponding with the data to be not present in the code value section
According to.
6th aspect, provides a kind of equipment of data processing, including:First coding unit, for using arithmetic coding
Algorithm encodes data, obtains coding section;Expanding element carries out re-spread exhibition for the coding section to the data, obtains
Coding section after to re-spread exhibition;Second coding unit, for according to the coding section after the re-spread exhibition, being calculated using arithmetic coding
Method continues to encode the data, obtains code value section.Acquiring unit, for obtaining the code value according to the code value section;It deposits
Storage unit, for storing the code value.
In conjunction with the 6th aspect, in the first possible implementation, which works as and meets in the following conditions extremely
When one few, re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition
Character position:The character number that the data of coding are completed reaches the length of character number threshold value and the coding section of the data
Degree is less than interval threshold.
7th aspect, provides a kind of equipment of data processing, including:First acquisition unit, the code for obtaining data
Value and re-spread exhibition character position;First decoding unit is obtained for being decoded to the code value of data using Arithmetic Coding algorithm
Decode section;Unit for scaling, for according to the re-spread exhibition character position, being scaled, being obtained to the decoding section of the data again
The decoding section scaled again;Second decoding unit is continued for the decoding section according to the heavy scaling using Arithmetic Coding algorithm
The data are decoded, the data are obtained.
In conjunction with the 7th aspect, in the first possible implementation, the unit for scaling is according to re-spread exhibition character position, really
Surely scale characters position is weighed, the wherein re-spread exhibition character position is mutually inverted with the heavy scale characters position;According to weight scale characters
Position scales the decoding section of the data again, the decoding section scaled again.
Eighth aspect provides a kind of equipment of data processing, including:Coding unit, for using Arithmetic Coding algorithm
Data are encoded, code value section is obtained;Acquiring unit, for obtaining the code value according to the code value section;Storage unit is used
In the storage code value;Applying unit, the application operating for carrying out the data according to the code value, the application operating include equivalence ratio
Compared with, arrangement and at least one of fuzzy query.
In conjunction with eighth aspect, in the first possible implementation, which is ID class character strings, the application operating packet
Include equivalence relatively, which determines the data and the code value to be compared when the code value is equal with code value to be compared
Corresponding data are identical data.
In conjunction with eighth aspect, in the third possible realization method, the data be ID classes character string or be field word
Alphabetic character string, the application operating include arrangement, which determines the code value in code value to be arranged according to the size of the code value
In position, the position of the code value is for indicating position of the data in data corresponding with the code value to be arranged.
In conjunction with eighth aspect, in the 5th kind of possible realization method, which includes fuzzy query, and the application is single
Whether whether member in the coding section of the prefix character string of required fuzzy query, determine the data including before this according to the code value
Sew character string, wherein when the code value is in the coding section of the prefix character string of required fuzzy query, before which includes this
Sew character string, when the code value is not in the coding section of the prefix character string of required fuzzy query, the data are including before this
Sew character string.
Based on the above-mentioned technical proposal, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains
Code value section.When there is code value corresponding with data in code value section, code value is obtained according to code value section;By the bit of code value
Number is compared with the bit number of data, to obtain comparison result;Storage operation is carried out according to comparison result.The present invention is implemented
Example can reduce the memory space of data.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will make below to required in the embodiment of the present invention
Attached drawing is briefly described, it should be apparent that, drawings described below is only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is the schematic flow chart of the method according to an embodiment of the invention for data compression.
Fig. 2 is field sequence schematic diagram according to an embodiment of the invention.
Fig. 3 is fuzzy query schematic diagram according to an embodiment of the invention.
Fig. 4 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 5 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 6 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 7 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 8 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 9 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Figure 10 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Figure 11 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Figure 12 is the schematic block diagram of the equipment of data processing according to an embodiment of the invention.
Figure 13 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 14 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 15 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 16 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 17 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 18 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 19 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiment is a part of the embodiment of the present invention, rather than whole embodiments.Based on this hair
Embodiment in bright, the every other reality that those of ordinary skill in the art are obtained without making creative work
Example is applied, the scope of protection of the invention should be all belonged to.
Fig. 1 is the schematic flow chart of the method according to an embodiment of the invention for data compression.It is shown in FIG. 1
Method can be executed by the equipment of the data compression in Fig. 4, and method as shown in Figure 1 includes:
110, data are encoded using Arithmetic Coding algorithm, obtain code value section.
Specifically, code value section can be the corresponding coding section of last character string of the data.
It should be understood that the data in the embodiment of the present invention can identify the word of (Identity, ID) class character string or field
Alphabetic character string etc., wherein ID class character strings may include:Data odd numbers, book index number, public transport license plate number, product document number,
International mobile subscriber identity (International Mobile Subscriber Identification Number,
) or mobile device international identity code (International Mobile Equipment Identity, IMEI) etc. IMSI;Word
Alphabetic character string may include Chinese, Korean, the corresponding phonetic symbol of Japanese etc..
120, when there is code value corresponding with data in code value section, code value is obtained according to code value section.
In other words, when can get suitable code value in code value section, code value is obtained according to code value section.For example,
Suitable code value can be the code value for meeting certain digit, for example, the binary digit of code value is no more than 16,32 or 64
Deng.
It should be noted that code value can be obtained according to code value section according to existing method, code can also be obtained according to preset condition
Value, for example, the binary digit that preset condition is code value is no more than 16,32 or 64 etc., the embodiment of the present invention is not to this
It limits.
130, by being compared for the bit number of code value and the bit number of data, to obtain comparison result.
Specifically, comparison result may include code value bit number be less than the bit numbers of data, the bit number of code value is equal to
The bit number of data or the bit number of code value are more than the bit number of data.
140, storage operation is carried out according to comparison result.
Specifically, data are encoded using Arithmetic Coding algorithm, obtains code value section, then, it is determined that the code value area
Between whether can get suitable code value in (the corresponding coding sections of last character string of data), if can not get by
Directly storage data are without arithmetic coding.If suitable code value can be got, then by the bit number of code value and data
Being compared and (judging compression income) for bit number, specifically, judges whether the bit number of the suitable code value is greater than or equal to
The bit number of former data is indicated, if the bit number of the suitable code value is greater than or equal to the bit number of former data, that is, nothing
Income is compressed, then abandons directly storing the data using arithmetic coding;If the bit number of the suitable code value is less than original
The bit number of data, that is, have compression income, then store code value.
Therefore, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code
When being worth in section in the presence of code value corresponding with data, code value is obtained according to code value section;By the ratio of the bit number of code value and data
Special number is compared, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce mistake
Coding accidentally, reduces the memory space of data.
Optionally, as another embodiment, comparison result is that the bit number of code value is less than the bit number of data, wherein
In 140, according to comparison result, code value is stored.
Specifically, when the bit number of code value is less than the bit number of data, show there is compression income, store code value.
Alternatively, as another embodiment, comparison result is that the bit number of code value is greater than or equal to the bit number of data,
Wherein, in 140, according to comparison result, data are stored.
Specifically, when the bit number of code value is greater than or equal to the bit number of data, show, without compression income, to store number
According to.
Optionally, as another embodiment, when storing code value, present invention method further includes being carried out according to code value
The application operating of data, application operating include at least one of equivalent comparison, arrangement and fuzzy query.
Specifically, data are encoded using Arithmetic Coding algorithm, obtains code value section;In code value section exist with
When the corresponding code value of data, code value is obtained according to code value section;By being compared for the bit number of code value and the bit number of data,
To obtain comparison result;Wherein comparison result be code value bit number be less than data bit number, store code value, according to code value into
At least one of equivalent comparison, arrangement and fuzzy query of row data.For example, the equivalence ratio of data can be carried out according to code value
Compared with, according to code value carry out data arrangement, or can according to code value to data carry out fuzzy query.It introduces separately below above-mentioned
The application operating of data is carried out according to code value.
Specifically, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, according to code value
The application operating of data is carried out, including:The equivalent of data is carried out according to code value to compare.
Further, as another embodiment, the equivalent of data is carried out according to code value and is compared, including:Code value with wait comparing
Compared with code value it is equal when, determine data data corresponding with code value to be compared be identical data.
It should be understood that code word to be compared is the code value of data (data corresponding with code value to be compared) to be compared.Tool
Body, the equivalent comparison for carrying out code value can be understood as carrying out the matching of data, for example, when the code value of two data is equal,
It can determine that corresponding two data of two code values are identical data namely successful match;When two code values are unequal,
It can determine that corresponding two data of two code values are different data, namely match unsuccessful.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating
Including arrangement, the application operating of data is carried out according to code value, including:The arrangement of data is carried out according to code value.
Further, as another embodiment, the sequence of data is carried out according to code value, including:According to the size of code value, really
Determine position of the code value in code value to be arranged, the position of code value is for indicating data in data corresponding with code value to be arranged
Position.
Specifically, for example, the sequence for carrying out data can be understood as being ranked up multiple data.For example, there is 5 numbers
According to, corresponding 5 code values, 5 ascending sequences of code value, for example, current code value is the 4th in 5 code values, then the 4th
The corresponding data of code value come the 4th in 5 data.
In existing database realizing, it will usually separately consider compress technique with inquiry operation, i.e., individually consider number
According to memory technology and relevant Query Optimization Technique.The important function of database is to store the important of the interested things of record
Description, the relevant informations such as things development.Therefore a large amount of memory space is not only taken up when things description is more tedious, it also can band
Come the inconvenience inquired.It is existing to data when carrying out a variety of inquiry operations (such as character string comparison, character string sorting etc.), because need
Gradually to compare the character in character string, search efficiency can be caused relatively low.And the embodiment of the present invention is not necessarily to the process of arithmetic decoding,
The comparison (matching) of data is directly carried out according to the corresponding code value of data and/or carries out the arrangement of data according to code value, it will originally
The equivalent inquiry at code value of inquiry operation of complex data type, quickly and easily.
For example, when (probability gives number or complete point by complete at this time in the application scenarios for number or letter only occur in data
To letter).For example, data are non-latin alphabets word.As Chinese, Korean, Japanese have corresponding phonetic symbol.It can be by will be right
It answers text conversion at its phonetic symbol representation, i.e., includes only the character string of letter, data are being carried out using Arithmetic Coding algorithm
Coding, obtains code value section;When there is code value corresponding with data in code value section, determine that the bit number of code value is less than data
Bit number;Store code value.The corresponding field of data is ranked up according to code value.
For example, Chinese phonetic alphabet probability of occurrence distribution (spelling), A (0.107), B (0.014), C (0.017), D
(0.030), (0.062) E, F (0.009), G (0.060), H (0.067), I (0.141), J (0.023), K (0.008), L
(0.017), (0.014) M, N (0.117), O (0.065), P (0.008), Q (0.013), R (0.006), S (0.026), T
(0.015), (0.096) U, V (0.001), W (0.010), X (0.020), Y (0.028), Z (0.026).As shown in Fig. 2, for
Field " outstanding " " good " " qualifying ", arranges in alphabetical order and encodes, wherein " outstanding " corresponding alphabetic character string is
" youxiu ", corresponding code value are 0.96684845;" good " corresponding alphabetic character string is " lianghao ", corresponding code value
It is 0.544375656;" qualifying " corresponding alphabetic character string is " jige ", and corresponding code value is 0.516228.According to code value by
It is small to be ordered as 0.516228,0.544375656 and 0.96684845 successively to big, respectively corresponding " qualifying ", " good " and " excellent
It is elegant ".
Alternatively, as another embodiment, application operating includes fuzzy query, and the application that data are carried out according to code value is grasped
Make, including:Fuzzy query is carried out to data according to code value.
Further, as another embodiment, fuzzy query is carried out to data according to code value, including:According to code value whether
In the coding section of the prefix character string of required fuzzy query, determine whether data include prefix character string, wherein in code value
When in the coding section of the prefix character string of required fuzzy query, data include prefix character string, in code value not in required mould
When pasting in the coding section of the prefix character string of inquiry, data do not include prefix character string.
In other words, when code value is in the coding section of the prefix character string of required fuzzy query, data meet fuzzy look into
It askes, when code value is not in the coding section of the prefix character string of required fuzzy query, data are unsatisfactory for fuzzy query.
Specifically, when character string is after arithmetic coding compresses, what is obtained is a series of code value.Code value comes from pair
Obtained coding section after string encoding, and it is mutually mutually non-orthogonal between encoding section.We also note that character string
Coding section be always contained in the coding section of its prefix character string.For example, the coding section one of character string ' A12986572 '
Surely it is contained in the coding section of the prefix characters string such as ' A1298 ', ' A12 '.Only code value need to be judged whether in required fuzzy query
Prefix character string coding section in, so that it may carry out fuzzy query.For example, set of integers space { 0,1,2,3 }, probability of occurrence point
Cloth { 0.2,0.5,0.2,0.1 }.For fuzzy query %210xxx, for 212132,210312,210231 and of data
211123 carry out the fuzzy query:As shown in figure 3, the code value section of " 210 " be [0.74,0.76], 212132,210312,
210231 and 211123 corresponding code value sections are respectively 0.8238,0.7592,0.7576 and 0.7923, due to 0.7592 He
0.7576 in the coding section [0.74,0.76] of the prefix character string of fuzzy query, and 0.8238 and 0.7923 does not look into fuzzy
In the coding section [0.74,0.76] of the prefix character string of inquiry, therefore, 210312 and 210231 meet fuzzy query, and 212132
It is unsatisfactory for fuzzy query with 211123.
Therefore, the embodiment of the present invention in addition fuzzy query operation in, when index character number be more than 2 when, you can counteracting sentence
Operation needed for disconnected deciding field.
Optionally, as another embodiment, in 110, data is encoded using Arithmetic Coding algorithm, are encoded
Section;Re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character bit
It sets;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value section.Its
In, the corresponding re-spread exhibition character position of record is for decoding device according to code value and the re-spread character opened up character position and scaled again
Position, and it is decoded according to the character position scaled again the scaling again in section, finally obtain data.
Specifically, the embodiment of the present invention can corresponding to any one character coding section carry out re-spread exhibition, can also root
The re-spread exhibition in coding section is carried out according to preset condition.
Therefore, the embodiment of the present invention can carry out the coding section of data the re-spread exhibition in coding section, due to code area
Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section
In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area
Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
Further, as another embodiment, in 110, when meeting at least one of the following conditions, to data
It encodes section and carries out re-spread exhibition, obtain the coding section after re-spread exhibition:The character number that the data of coding are completed reaches character
The length in the coding section of number threshold value and data is less than interval threshold.
Specifically, when the character number for meeting data is more than preset characters number threshold value, in 110, using arithmetic
Encryption algorithm encodes data, right when the character number of data of coding being completed reaching preset characters number threshold value
The coding section of the character of data corresponding with preset characters number threshold value carries out re-spread exhibition, and records corresponding re-spread exhibition character bit
It sets;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value section.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, when coding is completed
When the character number of data reaches preset characters number threshold value, the volume of the character of pair data corresponding with preset characters number threshold value
Code section carries out re-spread exhibition, and records corresponding re-spread exhibition character position;According to the coding section after re-spread exhibition, using arithmetic coding
Algorithm continues to encode data, obtains code value section.
For example, current data includes 12 character strings, preset characters number threshold value is 10, then being calculated using arithmetic coding
It is re-spread to the coding section of the 10th character in 12 character strings when method encodes 12 character strings of current data
Exhibition, and the position that the character position for recording re-spread exhibition is the 10th character, then, the coding section after re-spread exhibition, using arithmetic
Encryption algorithm continues to encode data (11 and 12 characters), finally obtains code value section (the corresponding coding of the 12nd character
Section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with
It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible
The problem of carrying out arithmetic coding, the embodiment of the present invention carry out coding section when character number reaches preset characters number threshold value
Re-spread exhibition, since coding section has carried out re-spread exhibition so that code value section similarly expands, energy of the embodiment of the present invention
It is enough to be easier to get suitable code value in widened code value section, the coding of mistake is avoided, realizes correctly coding.In addition,
The re-spread exhibition that the embodiment of the present invention carries out section can realize the space using limited digit to indicate sufficiently long character string number
According to.
When meeting the length in coding section of data and being less than preset threshold value, in 110, to the coding sections of data into
The re-spread exhibition of row;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value area
Between.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, coding section can be more next
It is smaller, when encoding section less than predetermined threshold value, to carrying out re-spread exhibition less than the coding section of predetermined threshold value;Later, according to weight
Coding section after extension uses Arithmetic Coding algorithm to continue, to encoding for the character string completed in data, to finally obtain
Code value section.
For example, current data includes 12 character strings, predetermined threshold value 0.05, then in use Arithmetic Coding algorithm to working as
When 12 character strings of preceding data are encoded, when encoding section less than 0.05, re-spread exhibition is carried out to the coding section of data;
For example, when the coding section that the coding section of the 7th character is the 0.1, the 8th character is 0.04, then, to the volume of 8 characters
Code section 0.04 carries out re-spread exhibition, for example, be extended to 1 or 10 etc. by 0.04, later, according to after re-spread exhibition coding section (1 or
10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtaining code value section, (the 12nd character corresponds to
Coding section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with
It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible
Carry out arithmetic coding the problem of, the embodiment of the present invention by encode section re-spread exhibition, due to coding section carried out it is re-spread
Exhibition is so that code value section similarly expands, therefore the embodiment of the present invention can be easier to get in widened code value section
Suitable code value avoids the coding of mistake, realizes correctly coding.In addition, the embodiment of the present invention carries out the heavy propagation energy in section
Enough realize indicates sufficiently long string data using the space of limited digit.
It should be noted that decoder will be transmitted same to by carrying out all information of interval extension, i.e. encoder is by binary code value
While sending decoder to, also displacement scheme information can be sent to decoder.For synchronizing information, ensure when decoded
Time obtains correct decoding result.
Fig. 4 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method shown in Fig. 4
It can be executed, can specifically be executed by encoding device, as shown in figure 4, this method includes by the equipment of data processing:
410, data are encoded using Arithmetic Coding algorithm, obtain coding section;
420, re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition;
Specifically, the embodiment of the present invention can corresponding to any one character coding section carry out re-spread exhibition, can also root
The re-spread exhibition in coding section is carried out according to preset condition.
430, according to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code
It is worth section.
440, code value is obtained according to code value section.
450, store code value.
Therefore, the embodiment of the present invention carries out the re-spread exhibition in coding section by the coding section to data, due to code area
Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section
In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area
Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
Further, as another embodiment, in 420, when meeting at least one of the following conditions, to data
It encodes section and carries out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:Coding is completed
The length in the character number coding section that reaches character number threshold value and data of data be less than interval threshold.Wherein, it records
The character position that corresponding re-spread exhibition character position is scaled for decoding device according to code value and re-spread exhibition character position again, and
It is decoded the scaling again in section according to the character position scaled again, finally obtains data.
Specifically, when the character number for meeting data is more than preset characters number threshold value, using Arithmetic Coding algorithm
Data are encoded, when the character number of data of coding being completed reaching preset characters number threshold value, pair and predetermined word
The coding section for according with the character of the corresponding data of number threshold value carries out re-spread exhibition;According to the coding section after re-spread exhibition, using calculation
Art encryption algorithm continues to encode data, obtains code value section.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, when coding is completed
When the character number of data reaches preset characters number threshold value, the volume of the character of pair data corresponding with preset characters number threshold value
Code section carries out re-spread exhibition;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain
To code value section.
For example, current data includes 12 character strings, preset characters number threshold value is 10, then being calculated using arithmetic coding
It is re-spread to the coding section of the 10th character in 12 character strings when method encodes 12 character strings of current data
Exhibition, and the position that the character position for recording re-spread exhibition is the 10th character, then, the coding section after re-spread exhibition, using arithmetic
Encryption algorithm continues to encode data (11 and 12 characters), finally obtains code value section (the corresponding coding of the 12nd character
Section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with
It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible
The problem of carrying out arithmetic coding, the embodiment of the present invention carry out coding section when character number reaches preset characters number threshold value
Re-spread exhibition, since coding section has carried out re-spread exhibition so that code value section similarly expands, energy of the embodiment of the present invention
It is enough to be easier to get suitable code value in widened code value section, the coding of mistake is avoided, realizes correctly coding.In addition,
The re-spread exhibition that the embodiment of the present invention carries out section can realize the space using limited digit to indicate sufficiently long character string number
According to.
When meeting the length in coding section of data less than preset threshold value, the coding section of data is carried out re-spread
Exhibition;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value section.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, coding section can be more next
It is smaller, when encoding section less than predetermined threshold value, to carrying out re-spread exhibition less than the coding section of predetermined threshold value;Later, according to weight
Coding section after extension uses Arithmetic Coding algorithm to continue, to encoding for the character string completed in data, to finally obtain
Code value section.
For example, current data includes 12 character strings, predetermined threshold value 0.05, then in use Arithmetic Coding algorithm to working as
When 12 character strings of preceding data are encoded, when encoding section less than 0.05, re-spread exhibition is carried out to the coding section of data;
For example, when the coding section that the coding section of the 7th character is the 0.1, the 8th character is 0.04, then, to the volume of 8 characters
Code section 0.04 carries out re-spread exhibition, for example, be extended to 1 or 10 etc. by 0.04, later, according to after re-spread exhibition coding section (1 or
10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtaining code value section, (the 12nd character corresponds to
Coding section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with
It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible
Carry out arithmetic coding the problem of, the embodiment of the present invention by encode section re-spread exhibition, due to coding section carried out it is re-spread
Exhibition is so that code value section similarly expands, therefore the embodiment of the present invention can be easier to get in widened code value section
Suitable code value avoids the coding of mistake, realizes correctly coding.In addition, the embodiment of the present invention carries out the heavy propagation energy in section
Enough realize indicates sufficiently long string data using the space of limited digit.
It should be noted that decoder will be transmitted same to by carrying out all information of interval extension, i.e. encoder is by binary code value
While sending decoder to, also displacement scheme information can be sent to decoder.For synchronizing information, ensure when decoded
Time obtains correct decoding result.
Fig. 5 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method shown in fig. 5
It can be executed, can specifically be executed by decoding device, as shown in figure 5, this method includes by the equipment of data processing:
510, obtain the code value of data and re-spread exhibition character position.
520, the code value of data is decoded using Arithmetic Coding algorithm, obtains decoding section.
530, according to re-spread exhibition character position, the decoding section of data is scaled again, the area decoder scaled again
Between.
540, according to the decoding section scaled again, continues to be decoded data using Arithmetic Coding algorithm, obtain data.
Therefore, the embodiment of the present invention has carried out the contracting again in decoding code section by the code value of the re-spread exhibition to encoding section
It puts, avoids the decoding of mistake, realize correctly decoding.
Specifically, as another embodiment, in 530, according to re-spread exhibition character position, weight scale characters position is determined,
Wherein re-spread exhibition character position is mutually inverted with weight scale characters position;According to weight scale characters position to the decoding sections of data into
Row scales again, the decoding section scaled again:The character number that the data of coding are completed reaches character number threshold value, sum number
According to coding section length be less than interval threshold.
It should be understood that in other words re-spread exhibition character position and again is mutually inverted in re-spread exhibition character position and weight scale characters position
Scale characters position is opposite (or symmetrical), for example, current data includes 12 character strings, to the 10th in 12 character strings
The re-spread exhibition in coding section of character, then re-spread exhibition character position is the position of the 10th character string, according to the 10th character string
Position can determine that the character position that scales again is the position of the 3rd character.
It should be understood that data processing method data processing method as shown in fig. 4 shown in fig. 5 is corresponding, difference lies in figures
Decoding process shown in 5 is the inverse operation of cataloged procedure shown in Fig. 4.The method of Fig. 5 can be obtained by the inverse process of Fig. 4, be
It avoids repeating, details are not described herein again.
Fig. 6 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method shown in fig. 6
It can be executed by the equipment of data processing, as shown in fig. 6, this method includes:
610, data are encoded using Arithmetic Coding algorithm, obtain code value section.
620, code value is obtained according to code value section.
630, store code value.
640, the application operating of data is carried out according to code value, application operating includes equivalent compares, in arrangement and fuzzy query
At least one.
Therefore, the embodiment of the present invention is to data encoding by obtaining code value, and according to code value carry out data it is equivalent relatively,
The application at least one of fuzzy query is arranged, is different from the existing equivalence for carrying out data according to source data and compares, arranges
Row and fuzzy query, by script complex data using the equivalent carry out respective handling at using code value, quickly and easily.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, in 640,
The equivalent of data is carried out according to code value to compare;
Further, as another embodiment, in 640, when code value is equal with code value to be compared, determine data with
The corresponding data of code value to be compared are identical data.
Specifically, the equivalent comparison for carrying out code value can be understood as carrying out the matching of data, for example, when two code values are equal
When, it may be determined that the corresponding data of two code values are identical data namely successful match;It, can when two code values are unequal
To determine the corresponding data of two code values as different data, namely matching is unsuccessful.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, using behaviour
Work includes arrangement, and in 640, the arrangement of data is carried out according to code value.
Further, as another embodiment, in 640, according to the size of code value, determine code value in code value to be arranged
Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Specifically, for example, the sequence for carrying out data can be understood as being ranked up multiple data.For example, there is 5 numbers
According to, corresponding 5 code values, 5 ascending sequences of code value, for example, current code value is the 4th in 5 code values, then the 4th
The corresponding data of code value come the 4th in 5 data.
In existing database realizing, it will usually separately consider compress technique with inquiry operation, i.e., individually consider number
According to memory technology and relevant Query Optimization Technique.The important function of database is to store the important of the interested things of record
Description, the relevant informations such as things development.Therefore a large amount of memory space is not only taken up when things description is more tedious, it also can band
Come the inconvenience inquired.It is existing to data when carrying out a variety of inquiry operations (such as character string comparison, character string sorting etc.), because need
Gradually to compare the character in character string, search efficiency can be caused relatively low.And the embodiment of the present invention is not necessarily to the process of arithmetic decoding,
The comparison (matching) of data is directly carried out according to the corresponding code value of data and/or carries out the arrangement of data according to code value, it will originally
The equivalent inquiry at code value of inquiry operation of complex data type, quickly and easily.
For example, when (probability gives number or complete point by complete at this time in the application scenarios for number or letter only occur in data
To letter).For example, data are non-latin alphabets word.As Chinese, Korean, Japanese have corresponding phonetic symbol.It can be by will be right
It answers text conversion at its phonetic symbol representation, i.e., includes only the character string of letter, data are being carried out using Arithmetic Coding algorithm
Coding, obtains code value section;When there is code value corresponding with data in code value section, determine that the bit number of code value is less than data
Bit number;Store code value.The corresponding field of data is ranked up according to code value.
For example, Chinese phonetic alphabet probability of occurrence distribution (spelling), A (0.107), B (0.014), C (0.017), D
(0.030), (0.062) E, F (0.009), G (0.060), H (0.067), I (0.141), J (0.023), K (0.008), L
(0.017), (0.014) M, N (0.117), O (0.065), P (0.008), Q (0.013), R (0.006), S (0.026), T
(0.015), (0.096) U, V (0.001), W (0.010), X (0.020), Y (0.028), Z (0.026).As shown in Fig. 2, for
Field " outstanding " " good " " qualifying ", arranges in alphabetical order and encodes, wherein " outstanding " corresponding alphabetic character string is
" youxiu ", corresponding code value are 0.96684845;" good " corresponding alphabetic character string is " lianghao ", corresponding code value
It is 0.544375656;" qualifying " corresponding alphabetic character string is " jige ", and corresponding code value is 0.516228.According to code value by
It is small to be ordered as 0.516228,0.544375656 and 0.96684845 successively to big, respectively corresponding " qualifying ", " good " and " excellent
It is elegant ".
Alternatively, as another embodiment, application operating includes fuzzy query, in 640, according to code value to data into
Row fuzzy query.
Further, as another embodiment, in 640, according to code value whether required fuzzy query prefix character
In the coding section of string, determine data whether include prefix character string, wherein code value required fuzzy query prefix character
When in the coding section of string, data include prefix character string, in code value not in the coding of the prefix character string of required fuzzy query
When in section, data do not include prefix character string.
In other words, when code value is in the coding section of the prefix character string of required fuzzy query, data meet fuzzy look into
It askes, when code value is not in the coding section of the prefix character string of required fuzzy query, data are unsatisfactory for fuzzy query.
Specifically, when character string is after arithmetic coding compresses, what is obtained is a series of code value.Code value comes from pair
Obtained coding section after string encoding, and it is mutually mutually non-orthogonal between encoding section.We also note that character string
Coding section be always contained in the coding section of its prefix character string.For example, the coding section one of character string ' A12986572 '
Surely it is contained in the coding section of the prefix characters string such as ' A1298 ', ' A12 '.Only code value need to be judged whether in required fuzzy query
Prefix character string coding section in, so that it may carry out fuzzy query.For example, set of integers space { 0,1,2,3 }, probability of occurrence point
Cloth { 0.2,0.5,0.2,0.1 }.For fuzzy query %210xxx, for 212132,210312,210231 and of data
211123 carry out the fuzzy query:As shown in figure 3, the code value section of " 210 " be [0.74,0.76], 212132,210312,
210231 and 211123 corresponding code value sections are respectively 0.8238,0.7592,0.7576 and 0.7923, due to 0.7592 He
0.7576 in the coding section [0.74,0.76] of the prefix character string of fuzzy query, and 0.8238 and 0.7923 does not look into fuzzy
In the coding section [0.74,0.76] of the prefix character string of inquiry, therefore, 210312 and 210231 meet fuzzy query, and 212132
It is unsatisfactory for fuzzy query with 211123.
Therefore, the embodiment of the present invention in addition fuzzy query operation in, when index character number be more than 2 when, you can counteracting sentence
Operation needed for disconnected deciding field.
The method that the data processing of the embodiment of the present invention is described in detail above in conjunction with Fig. 1 to Fig. 6, with reference to Fig. 7
To the specific example of Figure 11, it is described more fully the embodiment of the present invention.It should be noted that the example of Fig. 7 to Figure 11 is used for the purpose of
Help skilled in the art to understand the embodiment of the present invention, and concrete numerical value illustrated by the embodiment of the present invention have to being limited to or
Concrete scene.Those skilled in the art are according to the example of given Fig. 7 to Figure 11, it is clear that can carry out the modification of various equivalences
Or variation, such modification or variation are also fallen into the range of the embodiment of the present invention.
Fig. 7 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.In method in Fig. 7
It shows and assesses whether to include using the process of arithmetic coding, method as shown in Figure 7 according to income:
710, probabilistic model estimation.
720, list entries.
Specifically, which can be the character string sequence of source data, for example, can be number and alphabetical specific combination
Data type.Or numeric only or only alphabetical data type, for example, the sequence of input can be data sheet number, book index
Number, public transport license plate number, product document number, IMSI or IMEI etc., and corresponding phonetic symbol of Chinese, Korean, Japanese etc..
730 coding sections.
Specifically, according to Arithmetic Coding algorithm, the coding section of source data is determined.
740, code value is chosen.
Whether specifically, it is determined that whether there is code value corresponding with data in encoding section, in other words, it is determined can be with
Code value is chosen, step 750 is if it is carried out, 790 is carried out if it can not choose code value.
750, Profit Assessment.
Specifically, it when there is code value corresponding with data in code value section, determines whether the bit number of code value meets and wants
It asks, for example, determining whether the bit number of code value is less than the bit number of data.
760, if meet the requirements.
Step 770 is carried out when meeting the requirements, and otherwise, step 780 is carried out when being unsatisfactory for requiring.
770, application.
Specifically, the operations such as equivalent comparison, sequence and fuzzy query can be carried out according to code value.
780, it abandons.
Specifically, arithmetic coding is abandoned.
790, it abandons.
Specifically, arithmetic coding is abandoned.
Specifically, the embodiment of the present invention can determine whether suitable code can be got in last obtained coding section
Value, if can not get direct storage data without arithmetic coding.When code value can be got, will determine that indicate code
Whether value can cause required number of bits more than the required number of bits of former data before indicating, if will without compression income
It abandons using arithmetic coding.
Therefore, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code
When being worth in section in the presence of code value corresponding with data, determine that the bit number of code value is less than the bit number of data, and store code value, this
Inventive embodiments carry out income judgement, can reduce the coding of mistake, reduce the memory space of data.
Fig. 8 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.In method in Fig. 8
Show that the process of arithmetic coding and arithmetic decoding based on the coding re-spread exhibition in section, method as shown in Figure 8 include:
810, input source sequence.
Specifically, which can be the character string sequence of source data, for example, can be number and alphabetical specific combination
Data type.Or numeric only or only alphabetical data type, for example, the sequence of input can be data sheet number, book index
Number, public transport license plate number, product document number, IMSI or IMEI etc., and corresponding phonetic symbol of Chinese, Korean, Japanese etc..
820, source model.
Specifically, source model includes the probability value of each character.The equipment of data processing can be according to source model to source
Sequence carries out arithmetic coding.
830, arithmetic coding.
Specifically, the character string of data is gradually encoded according to source model using Arithmetic Coding algorithm.
840, if meet preset condition.
Specifically, judge whether the character number when the data that coding is completed reaches preset characters number threshold value or number
According to coding section length whether be less than preset threshold value.It is no to then follow the steps 840 if it is thening follow the steps 850.
850, encode the re-spread exhibition in section
When meeting preset condition, re-spread exhibition is carried out to the coding section of data;According to the coding section after re-spread exhibition, adopt
Continue to encode data with Arithmetic Coding algorithm, obtains code value section.Finally obtain the code value of source sequence.
For example, current data includes 12 character strings, predetermined threshold value 0.05, then in use Arithmetic Coding algorithm to working as
When 12 character strings of preceding data are encoded, when encoding section less than 0.05, re-spread exhibition is carried out to the coding section of data;
For example, when the coding section that the coding section of the 7th character is the 0.1, the 8th character is 0.04, then, to the volume of 8 characters
Code section 0.04 carries out re-spread exhibition, for example, be extended to 1 or 10 etc. by 0.04, later, according to after re-spread exhibition coding section (1 or
10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtaining code value section, (the 12nd character corresponds to
Coding section).
For another example current data includes 12 character strings, predetermined threshold value 0.05, then using Arithmetic Coding algorithm pair
When 12 character strings of current data are encoded, when encoding section less than 0.05, the coding section of data is carried out re-spread
Exhibition;For example, be 0.04 when the coding section of the 7th character is the coding section of the 0.1, the 8th character, then, to 8 characters
Coding section 0.04 carries out re-spread exhibition, for example, 1 or 10 etc. are extended to by 0.04, later, according to the coding section (1 after re-spread exhibition
Or 10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtain code value section (the 12nd character pair
The coding section answered).
860, arithmetic decoding.
Specifically, according to source model, and the information of coding, for example, if having carried out interval extension in decoding, equally
Opposite operation is carried out in decoding, carries out the scaling in section.
It should be noted that decoder will be transmitted same to by carrying out all information of interval extension, i.e. encoder is by binary code value
While sending decoder to, also displacement scheme information can be sent to decoder.For synchronizing information, ensure when decoded
Time obtains correct decoding result.
870, source model.
Corresponding with the source model in 820, specifically, source model includes the probability value of each character.Decoding data is set
It is standby arithmetic decoding to be carried out to source sequence according to source model.
880, obtain decoding sequence.
Specifically, decoding sequence can be identical as source sequence.
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with
It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible
The problem of carrying out arithmetic coding, can be easy to get suitable code value, avoid the coding of mistake, realize correctly coding.In addition,
The re-spread exhibition that the embodiment of the present invention carries out section can realize the space using limited digit to indicate sufficiently long character string number
According to.
Fig. 9 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.In method in Fig. 9
Show that the arithmetic coding of ID class character strings and the process of inquiry operation, method as shown in Figure 9 include:
910, obtain ID class character strings.
Specifically, ID classes character string can be the general of the words letters such as data sheet number, book index number, public transport license plate number appearance
Rate is relatively low, and the higher character string of probability that number occurs.For example usually letter only occupies 1 or 2 in character string
Position.
920, Summary for Design model.
Specifically, outline model includes the probability of each character.
930, arithmetic coding.
Specifically, the character string of data is gradually encoded according to outline model using Arithmetic Coding algorithm.
940, assessment judges.
Whether specifically, it is determined that whether there is code value corresponding with data in encoding section, in other words, it is determined can be with
Code value is chosen, if it is, determining whether the bit number of code value meets the requirements, for example, determining whether the bit number of code value is less than
The bit number of data.
950, if meet the requirements.
If met the requirements, 970 are carried out, otherwise, carries out step 960.
960, it abandons.
970, obtain code value.
Specifically, code value corresponding with ID class character strings is determined from code value section.
980, equivalence compares and/or sequence.
Specifically, the equivalent comparison that data can be carried out according to code value operates, when code value is equal with code value to be compared,
Determine that data data corresponding with code value to be compared are identical data.Specifically, the equivalent comparison for carrying out code value is appreciated that
To carry out the matching of data, for example, when two code values are equal, it may be determined that the corresponding data of two code values are identical number
According to namely successful match;When two code values are unequal, it may be determined that the corresponding data of two code values are different data,
It matches unsuccessful.
The sequence that data can also be carried out according to code value determines code value in code value to be arranged for example, according to the size of code value
Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.Specifically, into line number
According to sequence can be understood as being ranked up multiple data.For example, there is 5 data, corresponding 5 code values, 5 code values by it is small to
Big sequence, for example, current code value is the 4th in 5 code values, then the 4th corresponding data of code value come in 5 data
The 4th.
Figure 10 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method in Figure 10
In show field alphabetic character string arithmetic coding and field sequence process, method as shown in Figure 10 includes:
1010, obtain field.
Specifically, which can be Chinese, Korean, Japanese field etc., and the embodiment of the present invention is not limited to this, the field
Can also be other fields that alphabetic character string is converted by phonetic symbol.For example, the field can be that Chinese Fields " outstanding " are " good
It is good " and " qualifying ".
1020, alphabetic character string.
Specifically, field is converted into alphabetic character string.For example, " outstanding ", " good " and " qualifying " corresponding alphabetic word
Symbol string is respectively " youxiu ", " lianghao " and " jige ".
1030, phonetic alphabet probability.
Specifically, the probability of each letter is obtained, for example, Chinese phonetic alphabet probability of occurrence distribution (spelling), A
(0.107), (0.014) B, C (0.017), D (0.030), E (0.062), F (0.009), G (0.060), H (0.067), I
(0.141), (0.023) J, K (0.008), L (0.017), M (0.014), N (0.117), O (0.065), P (0.008), Q
(0.013), (0.006) R, S (0.026), T (0.015), U (0.096), V (0.001), W (0.010), X (0.020), Y
(0.028), (0.026) Z.
1040, arithmetic coding.
Specifically, it according to above-mentioned phonetic alphabet probability, is encoded using Arithmetic Coding algorithm.
1050, assessment judges.
Whether specifically, it is determined that whether there is code value corresponding with data in encoding section, in other words, it is determined can be with
Code value is chosen, if it is, determining whether the bit number of code value meets the requirements, for example, determining whether the bit number of code value is less than
The bit number of data.
1060, if meet the requirements.
If met the requirements, 1080 are carried out, otherwise, carries out step 1070.
1070, it returns.
1080, obtain code value.
Specifically, code value corresponding with alphabetic character string is determined from code value section.
1090, field sequence.
For example, for field " outstanding " " good " " qualifying ", arrange in alphabetical order and encode, wherein is " outstanding " corresponding
Alphabetic character string be " youxiu ", corresponding code value be 0.96684845;" good " corresponding alphabetic character string is
" lianghao ", corresponding code value are 0.544375656;" qualifying " corresponding alphabetic character string is " jige ", corresponding code value
It is 0.516228.According to code value it is ascending be ordered as 0.516228,0.544375656 and 0.96684845 successively, respectively
Corresponding " qualifying ", " good " and " outstanding ".
Figure 11 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method in Figure 11
In show that the process of the fuzzy query based on arithmetic coding, method as shown in figure 11 include:
1110, Selecting Index segment.
Specifically, the prefix character string segment for needing fuzzy query is obtained.For example, the index segment is " 210 ".
1120, arithmetic coding.
Specifically, index segment is encoded according to Arithmetic Coding algorithm.
1130, code value section.
Specifically, the code value section of index segment is obtained.For example, set of integers space { 0,1,2,3 }, probability of occurrence distribution
{0.2,0.5,0.2,0.1}.For " 210 ", the code value section of " 210 " is [0.74,0.76].
1140, obtain the corresponding code value of sequence.
Specifically, it obtains and needs the corresponding code value of the sequence of fuzzy query, for example, 212132,210312,210231 and
211123 corresponding code value sections are respectively 0.8238,0.7592,0.7576 and 0.7923.
1150, it examines and records.
Specifically, fuzzy query is carried out according to the code value section of the corresponding code value of sequence and index segment and records result.
For example, 210 " code value section is [0.74,0.76], the corresponding code value section point in 212132,210312,210231 and 211123
Not Wei 0.8238,0.7592,0.7576 and 0.7923, due to 0.7592 and 0.7576 the prefix character string of fuzzy query volume
In code section [0.74,0.76], 0.8238 and 0.7923 not the coding section of the prefix character string of fuzzy query [0.74,
0.76] in, therefore, 210312 and 210231 meet fuzzy query conditions, and 212132 and 211123 are unsatisfactory for fuzzy query conditions.
1160, if terminate.
Specifically, it if it is, carrying out step 870, otherwise carries out step 840 and obtains the corresponding code word of another sequence.
1170, export result.
It should be noted that the example of Fig. 7 to Figure 11, which is to help those skilled in the art, more fully understands the embodiment of the present invention,
And the range of the embodiment of the present invention have to be limited.Those skilled in the art are according to the example of given Fig. 7 to Figure 11, it is clear that can
To carry out the modification or variation of various equivalences, such modification or variation are also fallen into the range of the embodiment of the present invention.
It should be understood that size of the sequence numbers of the above procedures is not meant that the order of the execution order, the execution of each process is suitable
Sequence should be determined by its function and internal logic, and the implementation process of the embodiments of the invention shall not be constituted with any limitation.
Above, the method that the data processing of the embodiment of the present invention is described in detail in conjunction with Fig. 1 to Figure 11, below in conjunction with
The equipment that Figure 12 to Figure 19 describes the data processing of the embodiment of the present invention.
Figure 12 is the schematic block diagram of the equipment of data processing according to an embodiment of the invention.Data as shown in figure 12
The equipment 1200 of processing can be encoding device, and the equipment 1200 of data processing as shown in figure 12 includes:Coding unit 1210,
Acquiring unit 1220, comparing unit 1230 and the first storage unit 1240.
Specifically, coding unit 1210 obtain code value section for being encoded to data using Arithmetic Coding algorithm;
When for there is code value corresponding with data in code value section, code value is obtained according to code value section for acquiring unit 1220;Compare
Unit 1230, for being compared the bit number of the bit number of code value and data, to obtain comparison result;First storage is single
Member 1240, for carrying out storage operation according to comparison result.
Therefore, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code
When being worth in section in the presence of code value corresponding with data, code value is obtained according to code value section;By the ratio of the bit number of code value and data
Special number is compared, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce mistake
Coding accidentally, reduces the memory space of data.
Optionally, as another embodiment, comparison result is that the bit number of code value is less than the bit number of data, wherein the
One storage unit stores code value according to comparison result.
Alternatively, as another embodiment, comparison result is that the bit number of code value is greater than or equal to the bit number of data,
Wherein, the first storage unit stores data according to comparison result.
Optionally, as another embodiment, which further includes:Applying unit, for carrying out answering for data according to code value
With operation, application operating includes at least one of equivalent comparison, arrangement and fuzzy query.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, applying unit
The equivalent of data is carried out according to code value to compare.
Specifically, as another embodiment, applying unit determines data and waits for when code value is equal with code value to be compared
The corresponding data of code value compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating
Including arrangement, applying unit carries out the arrangement of data according to code value.
Specifically, as another embodiment, applying unit determines code value in code value to be arranged according to the size of code value
Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, applying unit according to code value to data into
Row fuzzy query.
Specifically, as another embodiment, applying unit according to code value whether required fuzzy query prefix character string
Coding section in, determine data whether include prefix character string, wherein code value required fuzzy query prefix character string
Coding section in when, data include prefix character string, in code value not in the code area of the prefix character string of required fuzzy query
Between it is middle when, data include prefix character string.
Optionally, as another embodiment, coding unit 1210 encodes data using Arithmetic Coding algorithm, obtains
Encode section;Re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition;According to the coding after re-spread exhibition
Section continues to encode data, obtains code value section using Arithmetic Coding algorithm.
Further, coding unit 1210 is when meeting at least one of the following conditions, to the coding sections of data into
The re-spread exhibition of row obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character position:The word of the data of coding is completed
The length that symbol number reaches the coding section of character number threshold value and data is less than interval threshold.
Optionally, as another embodiment, which further includes:Determination unit, for obtaining code in acquiring unit 1220
Before value, determines and whether there is suitable code value corresponding with data in code value section.
Optionally, as another embodiment, which further includes:Second storage unit, for not deposited in code value section
In suitable code value corresponding with data, data are stored.
It should be understood that the equipment of data processing shown in Figure 12 is corresponding with the method for data processing shown in FIG. 1, Tu12Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 1, data processing shown in Figure 12
The function of equipment can be found in the associated description of the method for Fig. 1 data processings, and to avoid repeating, details are not described herein again.
Figure 13 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in fig. 13 that
The equipment 1300 of processing can be encoding device, and equipment 1300 includes as shown in fig. 13 that:First coding unit 1310, extension are single
First 1320, second coding unit 1330, acquiring unit 1340 and storage unit 1350.
Specifically, the first coding unit 1310 obtains code area for being encoded to data using Arithmetic Coding algorithm
Between;Expanding element 1320 is used to carry out re-spread exhibition to the coding section of data, obtains the coding section after re-spread exhibition;Second coding
Unit 1330 is used to, according to the coding section after re-spread exhibition, continue to encode data using Arithmetic Coding algorithm, obtain code
It is worth section.Acquiring unit 1340 is used to obtain code value according to code value section;Storage unit 1350 is for storing code value.
Therefore, the embodiment of the present invention carries out the re-spread exhibition in coding section by the coding section to data, due to code area
Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section
In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area
Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
Optionally, as another embodiment, expanding element 1320 is when meeting at least one of the following conditions, to data
Coding section carry out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:Volume is completed
The length that the character number of the data of code reaches the coding section of character number threshold value and data is less than interval threshold.
It should be understood that the method for the equipment data processing as shown in fig. 4 of data processing shown in Figure 13 is corresponding, Tu13Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 4, data processing shown in Figure 13
The function of equipment can be found in the associated description of the method for Fig. 4 data processings, and to avoid repeating, details are not described herein again.
Figure 14 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 14
The equipment 1400 of processing can be decoding device, and equipment 1400 as shown in figure 14 includes:First acquisition unit 1410, first solves
Code unit 1420, unit for scaling 1430 and the second decoding unit 1440.
Specifically, first acquisition unit 1410 is for obtaining the code value of data and re-spread exhibition character position;First decoding is single
Member 1420 obtains decoding section for being decoded to the code value of data using Arithmetic Coding algorithm;Unit for scaling 1430 is used for
According to re-spread exhibition character position, the decoding section of data is scaled again, the decoding section scaled again;Second decoding is single
Member 1440 using Arithmetic Coding algorithm for according to the decoding section scaled again, continuing to be decoded data, obtaining data.
Therefore, the embodiment of the present invention has carried out the contracting again in decoding code section by the code value of the re-spread exhibition to encoding section
It puts, avoids the decoding of mistake, realize correctly decoding.
Optionally, as another embodiment, unit for scaling 1430 determines weight scale characters position according to re-spread exhibition character position
It sets, wherein re-spread exhibition character position is mutually inverted with weight scale characters position;According to weight scale characters position to the area decoder of data
Between scaled again, the decoding section scaled again.
It should be understood that the equipment of data processing shown in Figure 14 is corresponding with the method for data processing shown in fig. 5, Tu14Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 5, data processing shown in Figure 14
The function of equipment can be found in the associated description of the method for Fig. 5 data processings, and to avoid repeating, details are not described herein again.
Figure 15 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 15
The equipment 1500 of processing can be encoding device, and equipment 1500 as shown in figure 15 includes:Coding unit 1510, acquiring unit
1520, storage unit 1530 and applying unit 1540.
Specifically, coding unit 1510 obtains code value section for being encoded to data using Arithmetic Coding algorithm;It obtains
Take unit 1520 for obtaining code value according to code value section;Storage unit 1530 is for storing code value;Applying unit 1540 is used for
Carry out the application operating of data according to code value, application operating include it is equivalent relatively, at least one of arrangement and fuzzy query.
Therefore, the embodiment of the present invention is to data encoding by obtaining code value, and according to code value carry out data it is equivalent relatively,
The application at least one of fuzzy query is arranged, is different from the existing equivalence for carrying out data according to source data and compares, arranges
Row and fuzzy query, by script complex data using the equivalent carry out respective handling at using code value, quickly and easily.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, applying unit
1540 carry out the equivalent of data according to code value compares;
Specifically, as another embodiment, applying unit 1540 determines data when code value is equal with code value to be compared
Data corresponding with code value to be compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating
Including arrangement, applying unit 1540 carries out the arrangement of data according to code value.
Specifically, as another embodiment, applying unit 1540 determines code value in code value to be arranged according to the size of code value
In position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, and applying unit 1540 is according to code value logarithm
According to progress fuzzy query.
Specifically, as another embodiment, whether applying unit 1540 is according to code value in the preceding asyllabia of required fuzzy query
In the coding section for according with string, determine whether data include prefix character string, wherein in code value in the preceding asyllabia of required fuzzy query
When according in the coding section of string, data include prefix character string, in code value not in the volume of the prefix character string of required fuzzy query
When in code section, data do not include prefix character string.
It should be understood that the equipment of data processing shown in figure 15 is corresponding with the method for data processing shown in fig. 6, Tu15Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 6, data processing shown in figure 15
The function of equipment can be found in the associated description of the method for Fig. 6 data processings, and to avoid repeating, details are not described herein again.
Figure 16 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 16
The equipment 1600 of processing can be encoding device, and the equipment 1600 of data processing as shown in figure 16 includes:Including processor
1610, memory 1620 and bus system 1630.
Specifically, processor 1610 calls the code being stored in memory 1620 by bus system 1630, using calculation
Art encryption algorithm encodes data, obtains code value section;When there is code value corresponding with data in code value section, according to
Code value section obtains code value;By being compared for the bit number of code value and the bit number of data, to obtain comparison result;According to than
Relatively result carries out storage operation.
The embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code value area
When interior presence code value corresponding with data, code value is obtained according to code value section;By the bit number of the bit number of code value and data
Be compared, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce mistake
Coding, reduces the memory space of data.
The method that the embodiments of the present invention disclose can be applied in processor 1610, or real by processor 1610
It is existing.Processor 1610 may be a kind of IC chip, the processing capacity with signal.During realization, the above method
Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1610 or software form.Above-mentioned
Processor 1610 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation
DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made
Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices
Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention
Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine
Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution
Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access
Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only
Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe
In storage medium.The storage medium is located at memory 1620, and processor 1610 reads the information in memory 1620, hard in conjunction with it
Part completes the step of above method, which can also include power bus, control in addition to including data/address bus
Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1630 in figure.
Optionally, as another embodiment, comparison result is that the bit number of code value is less than the bit number of data, wherein place
Device 1610 is managed according to comparison result, stores code value.
Alternatively, as another embodiment, comparison result is that the bit number of code value is greater than or equal to the bit number of data,
Wherein, processor 1610 stores data according to comparison result.
Optionally, as another embodiment, which further includes:Processor 1610 is used to carry out answering for data according to code value
With operation, application operating includes at least one of equivalent comparison, arrangement and fuzzy query.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, processor
1610 carry out the equivalent of data according to code value compares.
Specifically, as another embodiment, processor 1610 when code value is equal with code value to be compared, determine data with
The corresponding data of code value to be compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating
Including arrangement, processor 1610 carries out the arrangement of data according to code value.
Specifically, as another embodiment, processor 1610 determines code value in code value to be arranged according to the size of code value
Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, and processor 1610 is according to code value to data
Carry out fuzzy query.
Specifically, as another embodiment, processor 1610 according to code value whether required fuzzy query prefix character
In the coding section of string determine data whether include prefix character string, wherein code value required fuzzy query prefix character
When in the coding section of string, data include prefix character string, in code value not in the coding of the prefix character string of required fuzzy query
When in section, data do not include prefix character string.
Optionally, as another embodiment, processor 1610 encodes data using Arithmetic Coding algorithm, is compiled
Code section;Re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition;According to the code area after re-spread exhibition
Between, continue to encode data using Arithmetic Coding algorithm, obtains code value section.
Further, processor 1610 carries out the coding section of data when meeting at least one of the following conditions
Re-spread exhibition obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character position:The character of the data of coding is completed
The length that number reaches the coding section of character number threshold value and data is less than interval threshold.
Optionally, as another embodiment, before processor 1610 obtains code value, determine whether there is in code value the section in
The corresponding suitable code value of data.
Optionally, as another embodiment, there is no corresponding with data suitable in code value section for processor 1610
When code value, data are stored.
It should be understood that the equipment of data processing shown in Figure 16 is corresponding with the method for data processing shown in FIG. 1, Tu16Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 1, data processing shown in Figure 16
The function of equipment can be found in the associated description of the method for Fig. 1 data processings, and to avoid repeating, details are not described herein again.
Figure 17 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 17
The equipment 1700 of processing can be encoding device, and equipment 1700 as shown in figure 17 includes:Including processor 1710, memory
1720 and bus system 1730.
Specifically, processor 1710 calls the code being stored in memory 1720 by bus system 1730, using calculation
Art encryption algorithm encodes data, obtains coding section;Re-spread exhibition is carried out to the coding section of data, after obtaining re-spread exhibition
Coding section;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value
Section.Code value is obtained according to code value section;Store code value.
Therefore, the embodiment of the present invention carries out the re-spread exhibition in coding section by the coding section to data, due to code area
Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section
In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area
Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
The method that the embodiments of the present invention disclose can be applied in processor 1710, or real by processor 1710
It is existing.Processor 1710 may be a kind of IC chip, the processing capacity with signal.During realization, the above method
Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1710 or software form.Above-mentioned
Processor 1710 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation
DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made
Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices
Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention
Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine
Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution
Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access
Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only
Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe
In storage medium.The storage medium is located at memory 1720, and processor 1710 reads the information in memory 1720, hard in conjunction with it
Part completes the step of above method, which can also include power bus, control in addition to including data/address bus
Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1730 in figure.
Optionally, as another embodiment, processor 1710 is when meeting at least one of the following conditions, to data
It encodes section and carries out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:Coding is completed
The length in the character number coding section that reaches character number threshold value and data of data be less than interval threshold.
It should be understood that the method for the equipment data processing as shown in fig. 4 of data processing shown in Figure 17 is corresponding, Tu17Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 4, data processing shown in Figure 17
The function of equipment can be found in the associated description of the method for Fig. 4 data processings, and to avoid repeating, details are not described herein again.
Figure 18 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 18
The equipment 1800 of processing can be decoding device, and equipment 1800 as shown in figure 18 includes:Including processor 1810, memory
1820 and bus system 1830.
Specifically, processor 1810 calls the code being stored in memory 1820 by bus system 1830, obtains number
According to code value and re-spread exhibition character position;The code value of data is decoded using Arithmetic Coding algorithm, obtains decoding section;Root
According to re-spread exhibition character position, the decoding section of data is scaled again, the decoding section scaled again;According to what is scaled again
Section is decoded, continues to be decoded data using Arithmetic Coding algorithm, obtains data.
Therefore, the embodiment of the present invention has carried out the contracting again in decoding code section by the code value of the re-spread exhibition to encoding section
It puts, avoids the decoding of mistake, realize correctly decoding.
The method that the embodiments of the present invention disclose can be applied in processor 1810, or real by processor 1810
It is existing.Processor 1810 may be a kind of IC chip, the processing capacity with signal.During realization, the above method
Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1810 or software form.Above-mentioned
Processor 1810 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation
DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made
Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices
Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention
Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine
Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution
Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access
Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only
Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe
In storage medium.The storage medium is located at memory 1820, and processor 1810 reads the information in memory 1820, hard in conjunction with it
Part completes the step of above method, which can also include power bus, control in addition to including data/address bus
Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1830 in figure.
Optionally, as another embodiment, processor 1810 determines weight scale characters position according to re-spread exhibition character position
It sets, wherein re-spread exhibition character position is mutually inverted with weight scale characters position;According to weight scale characters position to the area decoder of data
Between scaled again, the decoding section scaled again.
It should be understood that the equipment of data processing shown in Figure 18 is corresponding with the method for data processing shown in fig. 5, Tu18Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 5, data processing shown in Figure 18
The function of equipment can be found in the associated description of the method for Fig. 5 data processings, and to avoid repeating, details are not described herein again.
Figure 19 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 19
The equipment 1900 of processing can be encoding device, and equipment 1900 as shown in figure 19 includes:Including processor 1910, memory
1920 and bus system 1930.
Specifically, processor 1910 calls the code being stored in memory 1920 by bus system 1930, using calculation
Art encryption algorithm encodes data, obtains code value section;Code value is obtained according to code value section;Store code value;According to code value
Carry out the application operating of data, application operating include it is equivalent relatively, at least one of arrangement and fuzzy query.
Therefore, the embodiment of the present invention is to data encoding by obtaining code value, and according to code value carry out data it is equivalent relatively,
The application at least one of fuzzy query is arranged, is different from the existing equivalence for carrying out data according to source data and compares, arranges
Row and fuzzy query, by script complex data using the equivalent carry out respective handling at using code value, quickly and easily.
The method that the embodiments of the present invention disclose can be applied in processor 1910, or real by processor 1910
It is existing.Processor 1910 may be a kind of IC chip, the processing capacity with signal.During realization, the above method
Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1910 or software form.Above-mentioned
Processor 1910 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation
DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made
Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices
Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention
Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine
Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution
Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access
Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only
Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe
In storage medium.The storage medium is located at memory 1920, and processor 1910 reads the information in memory 1920, hard in conjunction with it
Part completes the step of above method, which can also include power bus, control in addition to including data/address bus
Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1930 in figure.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, processor
1910 carry out the equivalent of data according to code value compares;
Specifically, as another embodiment, processor 1910 when code value is equal with code value to be compared, determine data with
The corresponding data of code value to be compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating
Including arrangement, processor 1910 carries out the arrangement of data according to code value.
Specifically, as another embodiment, processor 1910 determines code value in code value to be arranged according to the size of code value
Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, and processor 1910 is according to code value to data
Carry out fuzzy query.
Specifically, as another embodiment, processor 1910 according to code value whether required fuzzy query prefix character
In the coding section of string, determine data whether include prefix character string, wherein code value required fuzzy query prefix character
When in the coding section of string, data include prefix character string, in code value not in the coding of the prefix character string of required fuzzy query
When in section, data do not include prefix character string.
It should be understood that the equipment of data processing shown in Figure 19 is corresponding with the method for data processing shown in fig. 6, Tu19Suo
The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 6, data processing shown in Figure 19
The function of equipment can be found in the associated description of the method for Fig. 6 data processings, and to avoid repeating, details are not described herein again.
It should be understood that " one embodiment " or " embodiment " that specification is mentioned in the whole text mean it is related with embodiment
A particular feature, structure, or characteristic includes at least one embodiment of the present invention.Therefore, occur everywhere in the whole instruction
" in one embodiment " or " in one embodiment " not necessarily refer to identical embodiment.In addition, these specific feature, knots
Structure or characteristic can in any suitable manner combine in one or more embodiments.It should be understood that in the various implementations of the present invention
In example, size of the sequence numbers of the above procedures is not meant that the order of the execution order, and the execution sequence of each process should be with its work(
It can determine that the implementation process of the embodiments of the invention shall not be constituted with any limitation with internal logic.
In addition, the terms " system " and " network " are often used interchangeably herein.The terms " and/
Or ", only a kind of incidence relation of description affiliated partner, indicates may exist three kinds of relationships, for example, A and/or B, it can be with table
Show:Individualism A exists simultaneously A and B, these three situations of individualism B.In addition, character "/" herein, typicallys represent front and back
Affiliated partner is a kind of relationship of "or".
It should be understood that in embodiments of the present invention, " B corresponding with A " indicates that B is associated with A, and B can be determined according to A.But
It should also be understood that determining that B is not meant to determine B only according to A according to A, B can also be determined according to A and/or other information.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware
With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This
A little functions are implemented in hardware or software actually, depend on the specific application and design constraint of technical solution.Specially
Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not
It is considered as beyond the scope of this invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is
The specific work process of system, device and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component
It can be combined or can be integrated into another system, or some features can be ignored or not executed.In addition, shown or beg for
The mutual coupling, direct-coupling or communication connection of opinion can be the INDIRECT COUPLING by some interfaces, device or unit
Or communication connection, can also be electricity, mechanical or other form connections.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the embodiment of the present invention
Purpose.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
It is that each unit physically exists alone, can also be during two or more units are integrated in one unit.It is above-mentioned integrated
The form that hardware had both may be used in unit is realized, can also be realized in the form of SFU software functional unit.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can be with
It is realized with hardware realization or firmware realization or combination thereof mode.It when implemented in software, can be by above-mentioned function
Storage in computer-readable medium or as on computer-readable medium one or more instructions or code be transmitted.Meter
Calculation machine readable medium includes computer storage media and communication media, and wherein communication media includes convenient for from a place to another
Any medium of a place transmission computer program.Storage medium can be any usable medium that computer can access.With
For this but it is not limited to:Computer-readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk
Storage medium or other magnetic storage apparatus or can be used in carry or store with instruction or data structure form expectation
Program code and can be by any other medium of computer access.In addition.Any connection appropriate can become computer
Readable medium.For example, if software is using coaxial cable, optical fiber cable, twisted-pair feeder, Digital Subscriber Line (DSL) or such as
The wireless technology of infrared ray, radio and microwave etc is transmitted from website, server or other remote sources, then coaxial electrical
The wireless technology of cable, optical fiber cable, twisted-pair feeder, DSL or such as infrared ray, wireless and microwave etc is included in affiliated medium
In fixing.As used in the present invention, disk (Disk) and dish (disc) include compressing optical disc (CD), laser disc, optical disc, number to lead to
With optical disc (DVD), floppy disk and Blu-ray Disc, the usually magnetic replicate data of which disk, and dish is then with laser come optical duplication
Data.Above combination above should also be as being included within the protection domain of computer-readable medium.
In short, the foregoing is merely the preferred embodiment of technical solution of the present invention, it is not intended to limit the present invention's
Protection domain.All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in
Within protection scope of the present invention.
Claims (30)
1. a kind of method of data processing, which is characterized in that including:
Data are encoded using Arithmetic Coding algorithm, obtain code value section;
When there is code value corresponding with the data in the code value section, the code value is obtained according to the code value section;
By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;
Storage operation is carried out in the database according to the comparison result.
2. according to the method described in claim 1, it is characterized in that, the comparison result is less than institute for the bit number of the code value
The bit number of data is stated,
Wherein, described that storage operation is carried out according to the comparison result in the database, including:
According to the comparison result, the code value is stored in the database.
3. according to the method described in claim 1, it is characterized in that, the comparison result be the code value bit number be more than or
Equal to the bit number of the data,
Wherein, described that storage operation is carried out according to the comparison result in the database, including:
According to the comparison result, the data are stored in the database.
4. according to the method described in claim 2, it is characterized in that, further including:
The application operating of the data is carried out according to the code value, the application operating, which compares including equivalence, arranges and obscures, to be looked into
At least one of ask.
5. according to the method described in claim 4, it is characterized in that, the data are mark ID class character strings, the application is grasped
Work includes that equivalence compares, the application operating that the data are carried out according to the code value, including:
When the code value is equal with code value to be compared, determine that data data corresponding with the code value to be compared are
Identical data.
6. according to the method described in claim 4, it is characterized in that, the data be ID classes character string or be field alphabetic word
Symbol string, the application operating include arranging, the application operating that the data are carried out according to the code value, including:
According to the size of the code value, determine that position of the code value in code value to be arranged, the position of the code value are used for table
Show position of the data in data corresponding with the code value to be arranged.
7. described according to institute according to the method described in claim 4, it is characterized in that, the application operating includes fuzzy query
The application operating that code value carries out the data is stated, including:
According to the code value whether in the coding section of the prefix character string of required fuzzy query, determine whether the data wrap
The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described
Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described
Sew character string.
8. method according to any one of claim 1 to 7, which is characterized in that described to use Arithmetic Coding algorithm logarithm
According to being encoded, code value section is obtained, including:
The data are encoded using Arithmetic Coding algorithm, obtain coding section;
Re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition;
According to the coding section after the re-spread exhibition, continues to encode the data using Arithmetic Coding algorithm, obtain code
It is worth section.
9. according to the method described in claim 8, it is characterized in that, the coding section to the data carries out re-spread exhibition,
The coding section after re-spread exhibition is obtained, including:
When meeting at least one of the following conditions, re-spread exhibition is carried out to the coding section of the data, after obtaining re-spread exhibition
Coding section, and record corresponding re-spread exhibition character position:
The character number that the data of coding are completed reaches the length for encoding section of character number threshold value and the data
Less than interval threshold.
10. method according to any one of claim 1 to 7, which is characterized in that exist and institute in the code value section
When stating the corresponding code value of data, before obtaining the code value according to the code value section, further include:
It determines and whether there is suitable code value corresponding with the data in the code value section.
11. according to the method described in claim 10, it is characterized in that, further including:
When suitable code value corresponding with the data being not present in the code value section, the data are stored.
12. a kind of method of data processing, which is characterized in that including:
Data are encoded using Arithmetic Coding algorithm, obtain code value section;
Code value is obtained according to the code value section;
The code value is stored in the database;
The application operating of the data is carried out according to the code value, the application operating, which compares including equivalence, arranges and obscures, to be looked into
At least one of ask;
Wherein, described to store the code value in the database, including:
By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;
According to the comparison result, the code value is stored in the database.
13. according to the method for claim 12, which is characterized in that the data are ID class character strings, the application operating
Relatively including equivalence, the application operating that the data are carried out according to the code value, including:
When the code value is equal with code value to be compared, determine that data data corresponding with the code value to be compared are
Identical data.
14. according to the method for claim 12, which is characterized in that the data be ID classes character string or be field letter
Character string, the application operating include arranging, the application operating that the data are carried out according to the code value, including:
According to the size of the code value, determine that position of the code value in code value to be arranged, the position of the code value are used for table
Show position of the data in data corresponding with the code value to be arranged.
15. according to the method for claim 12, which is characterized in that the application operating includes fuzzy query, the basis
The code value carries out the application operating of the data, including:
According to the code value whether in the coding section of the prefix character string of required fuzzy query, determine whether the data wrap
The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described
Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described
Sew character string.
16. a kind of equipment of data processing, which is characterized in that including:
Coding unit obtains code value section for being encoded to data using Arithmetic Coding algorithm;
Acquiring unit, when for there is code value corresponding with the data in the code value section, according to the code value section
Obtain the code value;
Comparing unit compares knot for being compared the bit number of the code value and the bit number of the data to obtain
Fruit;
First storage unit, for carrying out storage operation in the database according to the comparison result.
17. equipment according to claim 16, which is characterized in that the comparison result is that the bit number of the code value is less than
The bit number of the data,
Wherein, first storage unit stores the code value in the database according to the comparison result.
18. equipment according to claim 16, which is characterized in that the comparison result is that the bit number of the code value is more than
Or the bit number equal to the data,
Wherein, first storage unit stores the data in the database according to the comparison result.
19. equipment according to claim 17, which is characterized in that further include:
Applying unit, the application operating for carrying out the data according to the code value, the application operating include it is equivalent relatively,
At least one of arrangement and fuzzy query.
20. equipment according to claim 19, which is characterized in that the data are ID class character strings, the application operating
Relatively including equivalence, the applying unit determines that the data are waited for described when the code value is equal with code value to be compared
The corresponding data of code value compared are identical data.
21. equipment according to claim 19, which is characterized in that the data be ID classes character string or be field letter
Character string, the application operating include arrangement, and the applying unit determines the code value in the row of waiting for according to the size of the code value
Position in row code value, the position of the code value is for indicating the data in data corresponding with the code value to be arranged
Position.
22. equipment according to claim 19, which is characterized in that the application operating includes fuzzy query, the application
Whether unit in the coding section of the prefix character string of required fuzzy query determines whether the data wrap according to the code value
The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described
Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described
Sew character string.
23. the equipment according to any one of claim 16 to 22, which is characterized in that the coding unit is compiled using arithmetic
Code algorithm encodes the data, obtains coding section;Re-spread exhibition is carried out to the coding section of the data, is obtained re-spread
Coding section after exhibition;According to the coding section after the re-spread exhibition, continue to carry out the data using Arithmetic Coding algorithm
Coding, obtains code value section.
24. equipment according to claim 23, which is characterized in that the coding unit is worked as and met in the following conditions at least
At one, re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition
Character position:
The character number that the data of coding are completed reaches the length for encoding section of character number threshold value and the data
Less than interval threshold.
25. the equipment according to any one of claim 16 to 22, which is characterized in that further include:
Determination unit, for before the acquiring unit obtains the code value, determine whether there is in the code value section in
The corresponding suitable code value of the data.
26. equipment according to claim 25, which is characterized in that further include:
Second storage unit, when for suitable code value corresponding with the data to be not present in the code value section, storage
The data.
27. a kind of equipment of data processing, which is characterized in that including:
Coding unit obtains code value section for being encoded to data using Arithmetic Coding algorithm;
Acquiring unit, for obtaining code value according to the code value section;
Storage unit, for storing the code value in the database;
Applying unit, the application operating for carrying out the data according to the code value, the application operating include it is equivalent relatively,
At least one of arrangement and fuzzy query;
Wherein, the storage unit is specifically used for:
By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;
According to the comparison result, the code value is stored in the database.
28. equipment according to claim 27, which is characterized in that the data are ID class character strings, the application operating
Relatively including equivalence, the applying unit determines that the data are waited for described when the code value is equal with code value to be compared
The corresponding data of code value compared are identical data.
29. equipment according to claim 27, which is characterized in that the data be ID classes character string or be field letter
Character string, the application operating include arrangement, and the applying unit determines the code value in the row of waiting for according to the size of the code value
Position in row code value, the position of the code value is for indicating the data in data corresponding with the code value to be arranged
Position.
30. equipment according to claim 27, which is characterized in that the application operating includes fuzzy query, the application
Unit whether in the coding section of the prefix character string of required fuzzy query, determines whether the data wrap according to the code value
The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described
Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described
Sew character string.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510059809.6A CN104579360B (en) | 2015-02-04 | 2015-02-04 | A kind of method and apparatus of data processing |
EP16746065.8A EP3244540A4 (en) | 2015-02-04 | 2016-01-13 | Data processing method and device |
PCT/CN2016/070805 WO2016124070A1 (en) | 2015-02-04 | 2016-01-13 | Data processing method and device |
US15/668,335 US9998145B2 (en) | 2015-02-04 | 2017-08-03 | Data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510059809.6A CN104579360B (en) | 2015-02-04 | 2015-02-04 | A kind of method and apparatus of data processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104579360A CN104579360A (en) | 2015-04-29 |
CN104579360B true CN104579360B (en) | 2018-07-31 |
Family
ID=53094691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510059809.6A Active CN104579360B (en) | 2015-02-04 | 2015-02-04 | A kind of method and apparatus of data processing |
Country Status (4)
Country | Link |
---|---|
US (1) | US9998145B2 (en) |
EP (1) | EP3244540A4 (en) |
CN (1) | CN104579360B (en) |
WO (1) | WO2016124070A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104579360B (en) | 2015-02-04 | 2018-07-31 | 华为技术有限公司 | A kind of method and apparatus of data processing |
CN106484753B (en) * | 2016-06-07 | 2020-01-03 | 湖南千年华光软件开发有限公司 | Data processing method |
CN110326253B (en) * | 2016-12-30 | 2021-11-09 | 罗伯特·博世有限公司 | Method and system for fuzzy keyword search of encrypted data |
CN112422491A (en) * | 2020-05-08 | 2021-02-26 | 上海幻电信息科技有限公司 | Encryption and decryption method for digital codes, server and storage medium |
CN111968379B (en) * | 2020-08-10 | 2021-08-31 | 中化信息技术有限公司 | Method, device, terminal and computer readable medium for entering license plate number |
CN112181869A (en) * | 2020-09-11 | 2021-01-05 | 中国银联股份有限公司 | Information storage method, device, server and medium |
CN112486976A (en) * | 2020-12-18 | 2021-03-12 | 咪咕文化科技有限公司 | Data processing method, device, network equipment and storage medium |
CN112565776B (en) * | 2021-02-25 | 2021-07-20 | 北京城建设计发展集团股份有限公司 | Video transcoding compression method and system |
CN116719476B (en) * | 2023-05-26 | 2024-01-02 | 广州市玄武无线科技股份有限公司 | Compressed storage method and device for mobile phone numbers, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0793349A2 (en) * | 1996-02-29 | 1997-09-03 | Gandalf Technologies Inc. | Method and apparatus for performing data compression |
CN1167951A (en) * | 1996-01-31 | 1997-12-17 | 株式会社日立制作所 | Method of and apparatus for compressing and expanding data and data processing apparatus and network system using same |
CN101031086A (en) * | 2002-10-10 | 2007-09-05 | 索尼株式会社 | Video-information encoding method and video-information decoding method |
CN102799590A (en) * | 2011-05-26 | 2012-11-28 | 安凯(广州)微电子技术有限公司 | Embedded type electronic product word stock as well as word stock generating method and word stock searching method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3302210B2 (en) * | 1995-02-10 | 2002-07-15 | 富士通株式会社 | Data encoding / decoding method and apparatus |
US6100824A (en) * | 1998-04-06 | 2000-08-08 | National Dispatch Center, Inc. | System and method for data compression |
WO2007065352A1 (en) * | 2005-12-05 | 2007-06-14 | Huawei Technologies Co., Ltd. | Method and apparatus for realizing arithmetic coding/ decoding |
JP4555257B2 (en) * | 2006-06-06 | 2010-09-29 | パナソニック株式会社 | Image encoding device |
CN104579360B (en) * | 2015-02-04 | 2018-07-31 | 华为技术有限公司 | A kind of method and apparatus of data processing |
-
2015
- 2015-02-04 CN CN201510059809.6A patent/CN104579360B/en active Active
-
2016
- 2016-01-13 EP EP16746065.8A patent/EP3244540A4/en not_active Ceased
- 2016-01-13 WO PCT/CN2016/070805 patent/WO2016124070A1/en active Application Filing
-
2017
- 2017-08-03 US US15/668,335 patent/US9998145B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1167951A (en) * | 1996-01-31 | 1997-12-17 | 株式会社日立制作所 | Method of and apparatus for compressing and expanding data and data processing apparatus and network system using same |
EP0793349A2 (en) * | 1996-02-29 | 1997-09-03 | Gandalf Technologies Inc. | Method and apparatus for performing data compression |
CN101031086A (en) * | 2002-10-10 | 2007-09-05 | 索尼株式会社 | Video-information encoding method and video-information decoding method |
CN102799590A (en) * | 2011-05-26 | 2012-11-28 | 安凯(广州)微电子技术有限公司 | Embedded type electronic product word stock as well as word stock generating method and word stock searching method |
Non-Patent Citations (3)
Title |
---|
H.264中基于内容匹配的自适应二进制算术编码;石增硕;《中国优秀硕士学位论文全文数据库 信息科技辑》;20070615(第6期);第I136-347页 * |
一种改进的通过查表实现的算术编解码方法;张文妮 等;《复旦学报(自然科学版)》;20060228;第45卷(第1期);第45-48页 * |
基于自适应算术编码的字符型报文压缩技术;李玮 等;《科学技术与工程》;20130430;第13卷(第10期);第2836-2840页 * |
Also Published As
Publication number | Publication date |
---|---|
US9998145B2 (en) | 2018-06-12 |
EP3244540A1 (en) | 2017-11-15 |
EP3244540A4 (en) | 2018-01-31 |
CN104579360A (en) | 2015-04-29 |
US20170331492A1 (en) | 2017-11-16 |
WO2016124070A1 (en) | 2016-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104579360B (en) | A kind of method and apparatus of data processing | |
CN105684316B (en) | Polar code encoding method and device | |
CN108388598B (en) | Electronic device, data storage method, and storage medium | |
US8265407B2 (en) | Method for coding and decoding 3D data implemented as a mesh model | |
US11178212B2 (en) | Compressing and transmitting structured information | |
CN101350858B (en) | Method for decoding short message and user terminal | |
CN116506073B (en) | Industrial computer platform data rapid transmission method and system | |
US8838550B1 (en) | Readable text-based compression of resource identifiers | |
JP5656593B2 (en) | Apparatus and method for decoding encoded data | |
CN104657481A (en) | Data storage method and device and data query method and device | |
CN104572994B (en) | Method and apparatus for searching for data | |
CN103051480B (en) | The storage means of a kind of DN and DN storage device | |
CN110266834B (en) | Area searching method and device based on internet protocol address | |
CN109831544B (en) | Code storage method and system applied to email address | |
CN112287638A (en) | Digital display method and device | |
CN113742332A (en) | Data storage method, device, equipment and storage medium | |
CN114500670B (en) | Encoding compression method, decoding method and device | |
US8976048B2 (en) | Efficient processing of Huffman encoded data | |
CN110377822A (en) | Method, apparatus and electronic equipment for network characterisation study | |
CN107832341B (en) | AGNSS user duplicate removal statistical method | |
CN106533450B (en) | PMS code compression method and device | |
CN112232025B (en) | Character string storage method and device and electronic equipment | |
CN110287147B (en) | Character string sorting method and device | |
CN109660262A (en) | A kind of character coding method and system applied to E-mail address | |
CN115001628B (en) | Data encoding method and device, data decoding method and device and data structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |