CN104579360B - A kind of method and apparatus of data processing - Google Patents

A kind of method and apparatus of data processing Download PDF

Info

Publication number
CN104579360B
CN104579360B CN201510059809.6A CN201510059809A CN104579360B CN 104579360 B CN104579360 B CN 104579360B CN 201510059809 A CN201510059809 A CN 201510059809A CN 104579360 B CN104579360 B CN 104579360B
Authority
CN
China
Prior art keywords
data
code value
section
coding
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510059809.6A
Other languages
Chinese (zh)
Other versions
CN104579360A (en
Inventor
朱金伟
严龙
周庆庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510059809.6A priority Critical patent/CN104579360B/en
Publication of CN104579360A publication Critical patent/CN104579360A/en
Priority to EP16746065.8A priority patent/EP3244540A4/en
Priority to PCT/CN2016/070805 priority patent/WO2016124070A1/en
Priority to US15/668,335 priority patent/US9998145B2/en
Application granted granted Critical
Publication of CN104579360B publication Critical patent/CN104579360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/06Arrangements for sorting, selecting, merging, or comparing data on individual record carriers
    • G06F7/08Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/6082Selection strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Automation & Control Theory (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An embodiment of the present invention provides a kind of method and apparatus of data processing, this method includes being encoded to data using Arithmetic Coding algorithm, obtains code value section;When there is code value corresponding with data in code value section, code value is obtained according to code value section;By being compared for the bit number of code value and the bit number of data, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce the memory space of data.

Description

A kind of method and apparatus of data processing
Technical field
The present invention relates to data processing field, more particularly to a kind of method and apparatus of data processing.
Background technology
Arithmetic coding is very useful lossless compression algorithm another in recent years.The core concept of algorithm coding be by All symbols that are encoded being likely to occur are mapped to an integer set in source data, and assign each coded identification and go out accordingly Existing probability (there is a requirement that probability that all characters occur and be 1).According to the probability of occurrence of each character, each character occupy [0, 1] continuum of a half-open semi-closure in section, siding-to-siding block length value, that is, probability value, while being mutually independent between section. Required coded string is mapped to an integer sequence then according to mapping table, then according in the source data of required coding It is encoded the probability of symbol appearance, source data is gradually transformed into a real number interval for corresponding to [0,1] section, the areas Bing Qugai Between in a real number as code value store in a computer.The section encoded next time is last time to encode obtained area Between, and the probability ratio that all symbols occur every time is constant.The binary code value is restored according to inversionization when decoding At corresponding integer sequence, original character string is then re-mapped back.For example, for set of integers space { 0,1,2,3 }, occur general Rate is distributed as { 0.2,0.5,0.2,0.1 }.So corresponding to list entries is<210013>Data, coding section it is as follows successively: [0.7,0.9], [0.74,0.84], [0.74,0.76], [0.74,0.744], [0.7408,0.7428], [0.7426, 0.7428], the final corresponding code value section of the data is [0.7426,0.7428] (corresponding coding of last character sequence Section), the code value of the data is a numerical value in [0.7426,0.7428].
Do not consider whether compression has income for data to be encoded, in existing arithmetic coding, directly data are pressed Contracting, then stores the code value after corresponding arithmetic coding, since the corresponding code value bit number of certain data is larger, existing Technology will increase the memory space of data.
Invention content
The embodiment of the present invention provides a kind of method and apparatus of data processing, and the storage that this method can reduce data is empty Between.
In a first aspect, a kind of method of data processing is provided, including:Data are compiled using Arithmetic Coding algorithm Code, obtains code value section;When there is code value corresponding with the data in the code value section, which is obtained according to the code value section Value;By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;According to the comparison result into Row storage operation.
With reference to first aspect, in the first possible implementation, which is that the bit number of the code value is less than The bit number of the data, wherein this carries out storage operation according to the comparison result, including:According to the comparison result, the code is stored Value.
With reference to first aspect, in second of possible realization method, which is that the bit number of the code value is more than Or the bit number equal to the data, wherein this carries out storage operation according to the comparison result, including:According to the comparison result, deposit Store up the data.
Further include in the third possible realization method in conjunction with the first possible realization method:According to the code value into The application operating of the row data, the application operating include it is equivalent relatively, at least one of arrangement and fuzzy query.
In conjunction with the third possible realization method, in the 4th kind of possible realization method, which is mark ID class words Symbol string, the application operating include that equivalence compares, this carries out the application operating of the data according to the code value, including:The code value with When code value to be compared is equal, determine that data data corresponding with the code value to be compared are identical data.
In conjunction with the third possible realization method, in the 5th kind of possible realization method, which is ID class character strings Or the alphabetic character string for field, the application operating include arrangement, this carries out the application operating of the data according to the code value, packet It includes:According to the size of the code value, position of the code value in code value to be arranged is determined, the position of the code value is for indicating the data Position in data corresponding with the code value to be arranged.
In conjunction with the third possible realization method, in the 6th kind of possible realization method, which includes fuzzy Inquiry, this carries out the application operating of the data according to the code value, including:According to the code value whether required fuzzy query prefix In the coding section of character string, determine whether the data include the prefix character string, wherein in the code value in required fuzzy query Prefix character string coding section in when, which includes the prefix character string, in the code value not in required fuzzy query When in the coding section of prefix character string, which does not include the prefix character string.
With reference to first aspect, the possible realization method of any one of first to the 6th kind of possible realization method, In seven kinds of possible realization methods, this encodes data using Arithmetic Coding algorithm, obtains code value section, including:Using Arithmetic Coding algorithm encodes the data, obtains coding section;Re-spread exhibition is carried out to the coding section of the data, obtains weight Coding section after extension;According to the coding section after the re-spread exhibition, continue to compile the data using Arithmetic Coding algorithm Code, obtains code value section.
In conjunction with the 7th kind of possible realization method, in the 8th kind of possible realization method, the code area to the data Between carry out re-spread exhibition, obtain the coding section after re-spread exhibition, including:When meeting at least one of the following conditions, to the number According to coding section carry out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:It is completed The character number of the data of coding reaches character number threshold value and the length in the coding section of the data is less than interval threshold.
With reference to first aspect, the possible realization method of any one of first to the 8th kind of possible realization method, In nine kinds of possible realization methods, when there is code value corresponding with the data in the code value section, obtained according to the code value section Before taking the code value, further include:It determines and whether there is suitable code value corresponding with the data in the code value section.
Further include in the tenth kind of possible realization method in conjunction with the 9th kind of possible realization method:In the code value section It is interior be not present suitable code value corresponding with the data when, store the data.
Second aspect provides a kind of method of data processing, including:Data are compiled using Arithmetic Coding algorithm Code obtains coding section;Re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition;It is heavy according to this Coding section after extension continues to encode the data, obtains code value section using Arithmetic Coding algorithm;According to the code value Section obtains the code value;Store the code value.
In conjunction with second aspect, in the first possible implementation, which carries out re-spread exhibition, The coding section after re-spread exhibition is obtained, including:When meeting at least one of the following conditions, to the coding sections of the data into The re-spread exhibition of row obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character position:The data of coding are completed Character number reaches character number threshold value and the length in the coding section of the data is less than interval threshold.
The third aspect provides a kind of method of data processing, including:Obtain the code value of data and re-spread exhibition character bit It sets;The code value of data is decoded using Arithmetic Coding algorithm, obtains decoding section;It is right according to the re-spread exhibition character position The decoding section of the data is scaled again, the decoding section scaled again;According to the decoding section of the heavy scaling, using calculation Art encryption algorithm continues to be decoded the data, obtains the data.
In conjunction with the third aspect, in the first possible implementation, this is according to re-spread exhibition character position, to the data Decoding section is scaled again, the decoding section scaled again, including:According to re-spread exhibition character position, determination scales word again Position is accorded with, the wherein re-spread exhibition character position is mutually inverted with the heavy scale characters position;According to weight scale characters position to the number According to decoding section scaled again, the decoding section scaled again.
Fourth aspect provides a kind of method of data processing, including:Data are compiled using Arithmetic Coding algorithm Code, obtains code value section;The code value is obtained according to the code value section;Store the code value;Answering for the data is carried out according to the code value With operation, which includes at least one of equivalent comparison, arrangement and fuzzy query.
In conjunction with fourth aspect, in the first possible implementation, which is ID class character strings, the application operating packet Include it is equivalent relatively this carries out the application operating of the data according to the code value, be included in the code value it is equal with code value to be compared when, Determine that data data corresponding with the code value to be compared are identical data.
In conjunction with fourth aspect, in second of possible realization method, the data be ID classes character string or be field word Alphabetic character string, the application operating include arrangement, this carries out the application operating of the data according to the code value, including:According to the code value Size, determine position of the code value in code value to be arranged, the position of the code value is for indicating that the data are waiting arranging with this The corresponding data of code value in position.
In conjunction with fourth aspect, in the third possible realization method, which includes mould, this according to the code value into The application operating of the row data, including:According to the code value whether in the coding section of the prefix character string of required fuzzy query, Determine the data whether include the prefix character string, wherein the code value the prefix character string of required fuzzy query coding When in section, the data include the prefix character string in the code value not in the coding section of the prefix character string of required fuzzy query When middle, which does not include the prefix character string.
5th aspect, provides a kind of equipment of data processing, including:Coding unit, for using Arithmetic Coding algorithm Data are encoded, code value section is obtained;Acquiring unit, for there is code value corresponding with the data in the code value section When, which is obtained according to the code value section;Comparing unit, for by the bit number of the bit number of the code value and the data into Row compares, to obtain comparison result;First storage unit, for carrying out storage operation according to the comparison result.
In conjunction with the 5th aspect, in the first possible implementation, which is that the bit number of the code value is less than The bit number of the data, wherein first storage unit stores the code value according to the comparison result.
In conjunction with the 5th aspect, in second of possible realization method, which is that the bit number of the code value is more than Or the bit number equal to the data, wherein first storage unit stores the data according to the comparison result.
Further include in the third possible realization method in conjunction with the first possible realization method in terms of the 5th:It answers With unit, the application operating for carrying out the data according to the code value, which, which compares including equivalence, arranges and obscure, looks into At least one of ask.
In conjunction with the third possible realization method of the 5th aspect, in the 4th kind of possible realization method, which is ID class character strings, the application operating include that equivalence compares, which determines when the code value is equal with code value to be compared Data data corresponding with the code value to be compared are identical data.
In conjunction with the third possible realization method of the 5th aspect, in the 6th kind of possible realization method, which is ID classes character string or alphabetic character string for field, the application operating include arrangement, the applying unit according to the size of the code value, Determine position of the code value in code value to be arranged, the position of the code value is for indicating the data in the code value pair to be arranged with this Position in the data answered.
In conjunction with the third possible realization method of the 5th aspect, in the 8th kind of possible realization method, application behaviour Work includes fuzzy query, and whether the applying unit is according to the code value in the coding section of the prefix character string of required fuzzy query Determine the data whether include the prefix character string, wherein the code value the prefix character string of required fuzzy query coding When in section, which includes the prefix character string, in the code value not in the code area of the prefix character string of required fuzzy query Between it is middle when, the data include the prefix character string.
The possible realization of any one of first to the 9th kind of possible realization method in conjunction with the 5th aspect, the 5th aspect Mode, in the tenth kind of possible realization method, which encodes the data using Arithmetic Coding algorithm, obtains Encode section;Re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition;After the re-spread exhibition Section is encoded, continues to encode the data using Arithmetic Coding algorithm, obtains code value section.
In conjunction with the tenth kind of possible realization method of the 5th aspect, in a kind of the tenth possible realization method, the coding Unit carries out re-spread exhibition when meeting at least one of the following conditions, to the coding section of the data, after obtaining re-spread exhibition Section is encoded, and records corresponding re-spread exhibition character position:The character number that the data of coding are completed reaches character number threshold The length in the coding section of value and the data is less than interval threshold.
A kind of possible reality of any one of the first to the tenth possible realization method in conjunction with the 5th aspect, the 5th aspect Now mode further includes in the 12nd kind of possible realization method:Determination unit, for obtaining the code value in the acquiring unit Before, it determines and whether there is suitable code value corresponding with the data in the code value section.
It is also wrapped in the 13rd kind of possible realization method in conjunction with the 12nd kind of possible realization method of the 5th aspect It includes:Second storage unit stores the number when for suitable code value corresponding with the data to be not present in the code value section According to.
6th aspect, provides a kind of equipment of data processing, including:First coding unit, for using arithmetic coding Algorithm encodes data, obtains coding section;Expanding element carries out re-spread exhibition for the coding section to the data, obtains Coding section after to re-spread exhibition;Second coding unit, for according to the coding section after the re-spread exhibition, being calculated using arithmetic coding Method continues to encode the data, obtains code value section.Acquiring unit, for obtaining the code value according to the code value section;It deposits Storage unit, for storing the code value.
In conjunction with the 6th aspect, in the first possible implementation, which works as and meets in the following conditions extremely When one few, re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition Character position:The character number that the data of coding are completed reaches the length of character number threshold value and the coding section of the data Degree is less than interval threshold.
7th aspect, provides a kind of equipment of data processing, including:First acquisition unit, the code for obtaining data Value and re-spread exhibition character position;First decoding unit is obtained for being decoded to the code value of data using Arithmetic Coding algorithm Decode section;Unit for scaling, for according to the re-spread exhibition character position, being scaled, being obtained to the decoding section of the data again The decoding section scaled again;Second decoding unit is continued for the decoding section according to the heavy scaling using Arithmetic Coding algorithm The data are decoded, the data are obtained.
In conjunction with the 7th aspect, in the first possible implementation, the unit for scaling is according to re-spread exhibition character position, really Surely scale characters position is weighed, the wherein re-spread exhibition character position is mutually inverted with the heavy scale characters position;According to weight scale characters Position scales the decoding section of the data again, the decoding section scaled again.
Eighth aspect provides a kind of equipment of data processing, including:Coding unit, for using Arithmetic Coding algorithm Data are encoded, code value section is obtained;Acquiring unit, for obtaining the code value according to the code value section;Storage unit is used In the storage code value;Applying unit, the application operating for carrying out the data according to the code value, the application operating include equivalence ratio Compared with, arrangement and at least one of fuzzy query.
In conjunction with eighth aspect, in the first possible implementation, which is ID class character strings, the application operating packet Include equivalence relatively, which determines the data and the code value to be compared when the code value is equal with code value to be compared Corresponding data are identical data.
In conjunction with eighth aspect, in the third possible realization method, the data be ID classes character string or be field word Alphabetic character string, the application operating include arrangement, which determines the code value in code value to be arranged according to the size of the code value In position, the position of the code value is for indicating position of the data in data corresponding with the code value to be arranged.
In conjunction with eighth aspect, in the 5th kind of possible realization method, which includes fuzzy query, and the application is single Whether whether member in the coding section of the prefix character string of required fuzzy query, determine the data including before this according to the code value Sew character string, wherein when the code value is in the coding section of the prefix character string of required fuzzy query, before which includes this Sew character string, when the code value is not in the coding section of the prefix character string of required fuzzy query, the data are including before this Sew character string.
Based on the above-mentioned technical proposal, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains Code value section.When there is code value corresponding with data in code value section, code value is obtained according to code value section;By the bit of code value Number is compared with the bit number of data, to obtain comparison result;Storage operation is carried out according to comparison result.The present invention is implemented Example can reduce the memory space of data.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will make below to required in the embodiment of the present invention Attached drawing is briefly described, it should be apparent that, drawings described below is only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.
Fig. 1 is the schematic flow chart of the method according to an embodiment of the invention for data compression.
Fig. 2 is field sequence schematic diagram according to an embodiment of the invention.
Fig. 3 is fuzzy query schematic diagram according to an embodiment of the invention.
Fig. 4 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 5 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 6 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 7 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 8 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Fig. 9 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Figure 10 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Figure 11 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.
Figure 12 is the schematic block diagram of the equipment of data processing according to an embodiment of the invention.
Figure 13 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 14 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 15 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 16 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 17 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 18 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Figure 19 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiment is a part of the embodiment of the present invention, rather than whole embodiments.Based on this hair Embodiment in bright, the every other reality that those of ordinary skill in the art are obtained without making creative work Example is applied, the scope of protection of the invention should be all belonged to.
Fig. 1 is the schematic flow chart of the method according to an embodiment of the invention for data compression.It is shown in FIG. 1 Method can be executed by the equipment of the data compression in Fig. 4, and method as shown in Figure 1 includes:
110, data are encoded using Arithmetic Coding algorithm, obtain code value section.
Specifically, code value section can be the corresponding coding section of last character string of the data.
It should be understood that the data in the embodiment of the present invention can identify the word of (Identity, ID) class character string or field Alphabetic character string etc., wherein ID class character strings may include:Data odd numbers, book index number, public transport license plate number, product document number, International mobile subscriber identity (International Mobile Subscriber Identification Number, ) or mobile device international identity code (International Mobile Equipment Identity, IMEI) etc. IMSI;Word Alphabetic character string may include Chinese, Korean, the corresponding phonetic symbol of Japanese etc..
120, when there is code value corresponding with data in code value section, code value is obtained according to code value section.
In other words, when can get suitable code value in code value section, code value is obtained according to code value section.For example, Suitable code value can be the code value for meeting certain digit, for example, the binary digit of code value is no more than 16,32 or 64 Deng.
It should be noted that code value can be obtained according to code value section according to existing method, code can also be obtained according to preset condition Value, for example, the binary digit that preset condition is code value is no more than 16,32 or 64 etc., the embodiment of the present invention is not to this It limits.
130, by being compared for the bit number of code value and the bit number of data, to obtain comparison result.
Specifically, comparison result may include code value bit number be less than the bit numbers of data, the bit number of code value is equal to The bit number of data or the bit number of code value are more than the bit number of data.
140, storage operation is carried out according to comparison result.
Specifically, data are encoded using Arithmetic Coding algorithm, obtains code value section, then, it is determined that the code value area Between whether can get suitable code value in (the corresponding coding sections of last character string of data), if can not get by Directly storage data are without arithmetic coding.If suitable code value can be got, then by the bit number of code value and data Being compared and (judging compression income) for bit number, specifically, judges whether the bit number of the suitable code value is greater than or equal to The bit number of former data is indicated, if the bit number of the suitable code value is greater than or equal to the bit number of former data, that is, nothing Income is compressed, then abandons directly storing the data using arithmetic coding;If the bit number of the suitable code value is less than original The bit number of data, that is, have compression income, then store code value.
Therefore, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code When being worth in section in the presence of code value corresponding with data, code value is obtained according to code value section;By the ratio of the bit number of code value and data Special number is compared, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce mistake Coding accidentally, reduces the memory space of data.
Optionally, as another embodiment, comparison result is that the bit number of code value is less than the bit number of data, wherein In 140, according to comparison result, code value is stored.
Specifically, when the bit number of code value is less than the bit number of data, show there is compression income, store code value.
Alternatively, as another embodiment, comparison result is that the bit number of code value is greater than or equal to the bit number of data, Wherein, in 140, according to comparison result, data are stored.
Specifically, when the bit number of code value is greater than or equal to the bit number of data, show, without compression income, to store number According to.
Optionally, as another embodiment, when storing code value, present invention method further includes being carried out according to code value The application operating of data, application operating include at least one of equivalent comparison, arrangement and fuzzy query.
Specifically, data are encoded using Arithmetic Coding algorithm, obtains code value section;In code value section exist with When the corresponding code value of data, code value is obtained according to code value section;By being compared for the bit number of code value and the bit number of data, To obtain comparison result;Wherein comparison result be code value bit number be less than data bit number, store code value, according to code value into At least one of equivalent comparison, arrangement and fuzzy query of row data.For example, the equivalence ratio of data can be carried out according to code value Compared with, according to code value carry out data arrangement, or can according to code value to data carry out fuzzy query.It introduces separately below above-mentioned The application operating of data is carried out according to code value.
Specifically, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, according to code value The application operating of data is carried out, including:The equivalent of data is carried out according to code value to compare.
Further, as another embodiment, the equivalent of data is carried out according to code value and is compared, including:Code value with wait comparing Compared with code value it is equal when, determine data data corresponding with code value to be compared be identical data.
It should be understood that code word to be compared is the code value of data (data corresponding with code value to be compared) to be compared.Tool Body, the equivalent comparison for carrying out code value can be understood as carrying out the matching of data, for example, when the code value of two data is equal, It can determine that corresponding two data of two code values are identical data namely successful match;When two code values are unequal, It can determine that corresponding two data of two code values are different data, namely match unsuccessful.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating Including arrangement, the application operating of data is carried out according to code value, including:The arrangement of data is carried out according to code value.
Further, as another embodiment, the sequence of data is carried out according to code value, including:According to the size of code value, really Determine position of the code value in code value to be arranged, the position of code value is for indicating data in data corresponding with code value to be arranged Position.
Specifically, for example, the sequence for carrying out data can be understood as being ranked up multiple data.For example, there is 5 numbers According to, corresponding 5 code values, 5 ascending sequences of code value, for example, current code value is the 4th in 5 code values, then the 4th The corresponding data of code value come the 4th in 5 data.
In existing database realizing, it will usually separately consider compress technique with inquiry operation, i.e., individually consider number According to memory technology and relevant Query Optimization Technique.The important function of database is to store the important of the interested things of record Description, the relevant informations such as things development.Therefore a large amount of memory space is not only taken up when things description is more tedious, it also can band Come the inconvenience inquired.It is existing to data when carrying out a variety of inquiry operations (such as character string comparison, character string sorting etc.), because need Gradually to compare the character in character string, search efficiency can be caused relatively low.And the embodiment of the present invention is not necessarily to the process of arithmetic decoding, The comparison (matching) of data is directly carried out according to the corresponding code value of data and/or carries out the arrangement of data according to code value, it will originally The equivalent inquiry at code value of inquiry operation of complex data type, quickly and easily.
For example, when (probability gives number or complete point by complete at this time in the application scenarios for number or letter only occur in data To letter).For example, data are non-latin alphabets word.As Chinese, Korean, Japanese have corresponding phonetic symbol.It can be by will be right It answers text conversion at its phonetic symbol representation, i.e., includes only the character string of letter, data are being carried out using Arithmetic Coding algorithm Coding, obtains code value section;When there is code value corresponding with data in code value section, determine that the bit number of code value is less than data Bit number;Store code value.The corresponding field of data is ranked up according to code value.
For example, Chinese phonetic alphabet probability of occurrence distribution (spelling), A (0.107), B (0.014), C (0.017), D (0.030), (0.062) E, F (0.009), G (0.060), H (0.067), I (0.141), J (0.023), K (0.008), L (0.017), (0.014) M, N (0.117), O (0.065), P (0.008), Q (0.013), R (0.006), S (0.026), T (0.015), (0.096) U, V (0.001), W (0.010), X (0.020), Y (0.028), Z (0.026).As shown in Fig. 2, for Field " outstanding " " good " " qualifying ", arranges in alphabetical order and encodes, wherein " outstanding " corresponding alphabetic character string is " youxiu ", corresponding code value are 0.96684845;" good " corresponding alphabetic character string is " lianghao ", corresponding code value It is 0.544375656;" qualifying " corresponding alphabetic character string is " jige ", and corresponding code value is 0.516228.According to code value by It is small to be ordered as 0.516228,0.544375656 and 0.96684845 successively to big, respectively corresponding " qualifying ", " good " and " excellent It is elegant ".
Alternatively, as another embodiment, application operating includes fuzzy query, and the application that data are carried out according to code value is grasped Make, including:Fuzzy query is carried out to data according to code value.
Further, as another embodiment, fuzzy query is carried out to data according to code value, including:According to code value whether In the coding section of the prefix character string of required fuzzy query, determine whether data include prefix character string, wherein in code value When in the coding section of the prefix character string of required fuzzy query, data include prefix character string, in code value not in required mould When pasting in the coding section of the prefix character string of inquiry, data do not include prefix character string.
In other words, when code value is in the coding section of the prefix character string of required fuzzy query, data meet fuzzy look into It askes, when code value is not in the coding section of the prefix character string of required fuzzy query, data are unsatisfactory for fuzzy query.
Specifically, when character string is after arithmetic coding compresses, what is obtained is a series of code value.Code value comes from pair Obtained coding section after string encoding, and it is mutually mutually non-orthogonal between encoding section.We also note that character string Coding section be always contained in the coding section of its prefix character string.For example, the coding section one of character string ' A12986572 ' Surely it is contained in the coding section of the prefix characters string such as ' A1298 ', ' A12 '.Only code value need to be judged whether in required fuzzy query Prefix character string coding section in, so that it may carry out fuzzy query.For example, set of integers space { 0,1,2,3 }, probability of occurrence point Cloth { 0.2,0.5,0.2,0.1 }.For fuzzy query %210xxx, for 212132,210312,210231 and of data 211123 carry out the fuzzy query:As shown in figure 3, the code value section of " 210 " be [0.74,0.76], 212132,210312, 210231 and 211123 corresponding code value sections are respectively 0.8238,0.7592,0.7576 and 0.7923, due to 0.7592 He 0.7576 in the coding section [0.74,0.76] of the prefix character string of fuzzy query, and 0.8238 and 0.7923 does not look into fuzzy In the coding section [0.74,0.76] of the prefix character string of inquiry, therefore, 210312 and 210231 meet fuzzy query, and 212132 It is unsatisfactory for fuzzy query with 211123.
Therefore, the embodiment of the present invention in addition fuzzy query operation in, when index character number be more than 2 when, you can counteracting sentence Operation needed for disconnected deciding field.
Optionally, as another embodiment, in 110, data is encoded using Arithmetic Coding algorithm, are encoded Section;Re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character bit It sets;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value section.Its In, the corresponding re-spread exhibition character position of record is for decoding device according to code value and the re-spread character opened up character position and scaled again Position, and it is decoded according to the character position scaled again the scaling again in section, finally obtain data.
Specifically, the embodiment of the present invention can corresponding to any one character coding section carry out re-spread exhibition, can also root The re-spread exhibition in coding section is carried out according to preset condition.
Therefore, the embodiment of the present invention can carry out the coding section of data the re-spread exhibition in coding section, due to code area Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
Further, as another embodiment, in 110, when meeting at least one of the following conditions, to data It encodes section and carries out re-spread exhibition, obtain the coding section after re-spread exhibition:The character number that the data of coding are completed reaches character The length in the coding section of number threshold value and data is less than interval threshold.
Specifically, when the character number for meeting data is more than preset characters number threshold value, in 110, using arithmetic Encryption algorithm encodes data, right when the character number of data of coding being completed reaching preset characters number threshold value The coding section of the character of data corresponding with preset characters number threshold value carries out re-spread exhibition, and records corresponding re-spread exhibition character bit It sets;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value section.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, when coding is completed When the character number of data reaches preset characters number threshold value, the volume of the character of pair data corresponding with preset characters number threshold value Code section carries out re-spread exhibition, and records corresponding re-spread exhibition character position;According to the coding section after re-spread exhibition, using arithmetic coding Algorithm continues to encode data, obtains code value section.
For example, current data includes 12 character strings, preset characters number threshold value is 10, then being calculated using arithmetic coding It is re-spread to the coding section of the 10th character in 12 character strings when method encodes 12 character strings of current data Exhibition, and the position that the character position for recording re-spread exhibition is the 10th character, then, the coding section after re-spread exhibition, using arithmetic Encryption algorithm continues to encode data (11 and 12 characters), finally obtains code value section (the corresponding coding of the 12nd character Section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible The problem of carrying out arithmetic coding, the embodiment of the present invention carry out coding section when character number reaches preset characters number threshold value Re-spread exhibition, since coding section has carried out re-spread exhibition so that code value section similarly expands, energy of the embodiment of the present invention It is enough to be easier to get suitable code value in widened code value section, the coding of mistake is avoided, realizes correctly coding.In addition, The re-spread exhibition that the embodiment of the present invention carries out section can realize the space using limited digit to indicate sufficiently long character string number According to.
When meeting the length in coding section of data and being less than preset threshold value, in 110, to the coding sections of data into The re-spread exhibition of row;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value area Between.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, coding section can be more next It is smaller, when encoding section less than predetermined threshold value, to carrying out re-spread exhibition less than the coding section of predetermined threshold value;Later, according to weight Coding section after extension uses Arithmetic Coding algorithm to continue, to encoding for the character string completed in data, to finally obtain Code value section.
For example, current data includes 12 character strings, predetermined threshold value 0.05, then in use Arithmetic Coding algorithm to working as When 12 character strings of preceding data are encoded, when encoding section less than 0.05, re-spread exhibition is carried out to the coding section of data; For example, when the coding section that the coding section of the 7th character is the 0.1, the 8th character is 0.04, then, to the volume of 8 characters Code section 0.04 carries out re-spread exhibition, for example, be extended to 1 or 10 etc. by 0.04, later, according to after re-spread exhibition coding section (1 or 10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtaining code value section, (the 12nd character corresponds to Coding section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible Carry out arithmetic coding the problem of, the embodiment of the present invention by encode section re-spread exhibition, due to coding section carried out it is re-spread Exhibition is so that code value section similarly expands, therefore the embodiment of the present invention can be easier to get in widened code value section Suitable code value avoids the coding of mistake, realizes correctly coding.In addition, the embodiment of the present invention carries out the heavy propagation energy in section Enough realize indicates sufficiently long string data using the space of limited digit.
It should be noted that decoder will be transmitted same to by carrying out all information of interval extension, i.e. encoder is by binary code value While sending decoder to, also displacement scheme information can be sent to decoder.For synchronizing information, ensure when decoded Time obtains correct decoding result.
Fig. 4 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method shown in Fig. 4 It can be executed, can specifically be executed by encoding device, as shown in figure 4, this method includes by the equipment of data processing:
410, data are encoded using Arithmetic Coding algorithm, obtain coding section;
420, re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition;
Specifically, the embodiment of the present invention can corresponding to any one character coding section carry out re-spread exhibition, can also root The re-spread exhibition in coding section is carried out according to preset condition.
430, according to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code It is worth section.
440, code value is obtained according to code value section.
450, store code value.
Therefore, the embodiment of the present invention carries out the re-spread exhibition in coding section by the coding section to data, due to code area Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
Further, as another embodiment, in 420, when meeting at least one of the following conditions, to data It encodes section and carries out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:Coding is completed The length in the character number coding section that reaches character number threshold value and data of data be less than interval threshold.Wherein, it records The character position that corresponding re-spread exhibition character position is scaled for decoding device according to code value and re-spread exhibition character position again, and It is decoded the scaling again in section according to the character position scaled again, finally obtains data.
Specifically, when the character number for meeting data is more than preset characters number threshold value, using Arithmetic Coding algorithm Data are encoded, when the character number of data of coding being completed reaching preset characters number threshold value, pair and predetermined word The coding section for according with the character of the corresponding data of number threshold value carries out re-spread exhibition;According to the coding section after re-spread exhibition, using calculation Art encryption algorithm continues to encode data, obtains code value section.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, when coding is completed When the character number of data reaches preset characters number threshold value, the volume of the character of pair data corresponding with preset characters number threshold value Code section carries out re-spread exhibition;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain To code value section.
For example, current data includes 12 character strings, preset characters number threshold value is 10, then being calculated using arithmetic coding It is re-spread to the coding section of the 10th character in 12 character strings when method encodes 12 character strings of current data Exhibition, and the position that the character position for recording re-spread exhibition is the 10th character, then, the coding section after re-spread exhibition, using arithmetic Encryption algorithm continues to encode data (11 and 12 characters), finally obtains code value section (the corresponding coding of the 12nd character Section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible The problem of carrying out arithmetic coding, the embodiment of the present invention carry out coding section when character number reaches preset characters number threshold value Re-spread exhibition, since coding section has carried out re-spread exhibition so that code value section similarly expands, energy of the embodiment of the present invention It is enough to be easier to get suitable code value in widened code value section, the coding of mistake is avoided, realizes correctly coding.In addition, The re-spread exhibition that the embodiment of the present invention carries out section can realize the space using limited digit to indicate sufficiently long character string number According to.
When meeting the length in coding section of data less than preset threshold value, the coding section of data is carried out re-spread Exhibition;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value section.
In other words, when gradually being encoded to the character string of data using Arithmetic Coding algorithm, coding section can be more next It is smaller, when encoding section less than predetermined threshold value, to carrying out re-spread exhibition less than the coding section of predetermined threshold value;Later, according to weight Coding section after extension uses Arithmetic Coding algorithm to continue, to encoding for the character string completed in data, to finally obtain Code value section.
For example, current data includes 12 character strings, predetermined threshold value 0.05, then in use Arithmetic Coding algorithm to working as When 12 character strings of preceding data are encoded, when encoding section less than 0.05, re-spread exhibition is carried out to the coding section of data; For example, when the coding section that the coding section of the 7th character is the 0.1, the 8th character is 0.04, then, to the volume of 8 characters Code section 0.04 carries out re-spread exhibition, for example, be extended to 1 or 10 etc. by 0.04, later, according to after re-spread exhibition coding section (1 or 10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtaining code value section, (the 12nd character corresponds to Coding section).
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible Carry out arithmetic coding the problem of, the embodiment of the present invention by encode section re-spread exhibition, due to coding section carried out it is re-spread Exhibition is so that code value section similarly expands, therefore the embodiment of the present invention can be easier to get in widened code value section Suitable code value avoids the coding of mistake, realizes correctly coding.In addition, the embodiment of the present invention carries out the heavy propagation energy in section Enough realize indicates sufficiently long string data using the space of limited digit.
It should be noted that decoder will be transmitted same to by carrying out all information of interval extension, i.e. encoder is by binary code value While sending decoder to, also displacement scheme information can be sent to decoder.For synchronizing information, ensure when decoded Time obtains correct decoding result.
Fig. 5 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method shown in fig. 5 It can be executed, can specifically be executed by decoding device, as shown in figure 5, this method includes by the equipment of data processing:
510, obtain the code value of data and re-spread exhibition character position.
520, the code value of data is decoded using Arithmetic Coding algorithm, obtains decoding section.
530, according to re-spread exhibition character position, the decoding section of data is scaled again, the area decoder scaled again Between.
540, according to the decoding section scaled again, continues to be decoded data using Arithmetic Coding algorithm, obtain data.
Therefore, the embodiment of the present invention has carried out the contracting again in decoding code section by the code value of the re-spread exhibition to encoding section It puts, avoids the decoding of mistake, realize correctly decoding.
Specifically, as another embodiment, in 530, according to re-spread exhibition character position, weight scale characters position is determined, Wherein re-spread exhibition character position is mutually inverted with weight scale characters position;According to weight scale characters position to the decoding sections of data into Row scales again, the decoding section scaled again:The character number that the data of coding are completed reaches character number threshold value, sum number According to coding section length be less than interval threshold.
It should be understood that in other words re-spread exhibition character position and again is mutually inverted in re-spread exhibition character position and weight scale characters position Scale characters position is opposite (or symmetrical), for example, current data includes 12 character strings, to the 10th in 12 character strings The re-spread exhibition in coding section of character, then re-spread exhibition character position is the position of the 10th character string, according to the 10th character string Position can determine that the character position that scales again is the position of the 3rd character.
It should be understood that data processing method data processing method as shown in fig. 4 shown in fig. 5 is corresponding, difference lies in figures Decoding process shown in 5 is the inverse operation of cataloged procedure shown in Fig. 4.The method of Fig. 5 can be obtained by the inverse process of Fig. 4, be It avoids repeating, details are not described herein again.
Fig. 6 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method shown in fig. 6 It can be executed by the equipment of data processing, as shown in fig. 6, this method includes:
610, data are encoded using Arithmetic Coding algorithm, obtain code value section.
620, code value is obtained according to code value section.
630, store code value.
640, the application operating of data is carried out according to code value, application operating includes equivalent compares, in arrangement and fuzzy query At least one.
Therefore, the embodiment of the present invention is to data encoding by obtaining code value, and according to code value carry out data it is equivalent relatively, The application at least one of fuzzy query is arranged, is different from the existing equivalence for carrying out data according to source data and compares, arranges Row and fuzzy query, by script complex data using the equivalent carry out respective handling at using code value, quickly and easily.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, in 640, The equivalent of data is carried out according to code value to compare;
Further, as another embodiment, in 640, when code value is equal with code value to be compared, determine data with The corresponding data of code value to be compared are identical data.
Specifically, the equivalent comparison for carrying out code value can be understood as carrying out the matching of data, for example, when two code values are equal When, it may be determined that the corresponding data of two code values are identical data namely successful match;It, can when two code values are unequal To determine the corresponding data of two code values as different data, namely matching is unsuccessful.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, using behaviour Work includes arrangement, and in 640, the arrangement of data is carried out according to code value.
Further, as another embodiment, in 640, according to the size of code value, determine code value in code value to be arranged Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Specifically, for example, the sequence for carrying out data can be understood as being ranked up multiple data.For example, there is 5 numbers According to, corresponding 5 code values, 5 ascending sequences of code value, for example, current code value is the 4th in 5 code values, then the 4th The corresponding data of code value come the 4th in 5 data.
In existing database realizing, it will usually separately consider compress technique with inquiry operation, i.e., individually consider number According to memory technology and relevant Query Optimization Technique.The important function of database is to store the important of the interested things of record Description, the relevant informations such as things development.Therefore a large amount of memory space is not only taken up when things description is more tedious, it also can band Come the inconvenience inquired.It is existing to data when carrying out a variety of inquiry operations (such as character string comparison, character string sorting etc.), because need Gradually to compare the character in character string, search efficiency can be caused relatively low.And the embodiment of the present invention is not necessarily to the process of arithmetic decoding, The comparison (matching) of data is directly carried out according to the corresponding code value of data and/or carries out the arrangement of data according to code value, it will originally The equivalent inquiry at code value of inquiry operation of complex data type, quickly and easily.
For example, when (probability gives number or complete point by complete at this time in the application scenarios for number or letter only occur in data To letter).For example, data are non-latin alphabets word.As Chinese, Korean, Japanese have corresponding phonetic symbol.It can be by will be right It answers text conversion at its phonetic symbol representation, i.e., includes only the character string of letter, data are being carried out using Arithmetic Coding algorithm Coding, obtains code value section;When there is code value corresponding with data in code value section, determine that the bit number of code value is less than data Bit number;Store code value.The corresponding field of data is ranked up according to code value.
For example, Chinese phonetic alphabet probability of occurrence distribution (spelling), A (0.107), B (0.014), C (0.017), D (0.030), (0.062) E, F (0.009), G (0.060), H (0.067), I (0.141), J (0.023), K (0.008), L (0.017), (0.014) M, N (0.117), O (0.065), P (0.008), Q (0.013), R (0.006), S (0.026), T (0.015), (0.096) U, V (0.001), W (0.010), X (0.020), Y (0.028), Z (0.026).As shown in Fig. 2, for Field " outstanding " " good " " qualifying ", arranges in alphabetical order and encodes, wherein " outstanding " corresponding alphabetic character string is " youxiu ", corresponding code value are 0.96684845;" good " corresponding alphabetic character string is " lianghao ", corresponding code value It is 0.544375656;" qualifying " corresponding alphabetic character string is " jige ", and corresponding code value is 0.516228.According to code value by It is small to be ordered as 0.516228,0.544375656 and 0.96684845 successively to big, respectively corresponding " qualifying ", " good " and " excellent It is elegant ".
Alternatively, as another embodiment, application operating includes fuzzy query, in 640, according to code value to data into Row fuzzy query.
Further, as another embodiment, in 640, according to code value whether required fuzzy query prefix character In the coding section of string, determine data whether include prefix character string, wherein code value required fuzzy query prefix character When in the coding section of string, data include prefix character string, in code value not in the coding of the prefix character string of required fuzzy query When in section, data do not include prefix character string.
In other words, when code value is in the coding section of the prefix character string of required fuzzy query, data meet fuzzy look into It askes, when code value is not in the coding section of the prefix character string of required fuzzy query, data are unsatisfactory for fuzzy query.
Specifically, when character string is after arithmetic coding compresses, what is obtained is a series of code value.Code value comes from pair Obtained coding section after string encoding, and it is mutually mutually non-orthogonal between encoding section.We also note that character string Coding section be always contained in the coding section of its prefix character string.For example, the coding section one of character string ' A12986572 ' Surely it is contained in the coding section of the prefix characters string such as ' A1298 ', ' A12 '.Only code value need to be judged whether in required fuzzy query Prefix character string coding section in, so that it may carry out fuzzy query.For example, set of integers space { 0,1,2,3 }, probability of occurrence point Cloth { 0.2,0.5,0.2,0.1 }.For fuzzy query %210xxx, for 212132,210312,210231 and of data 211123 carry out the fuzzy query:As shown in figure 3, the code value section of " 210 " be [0.74,0.76], 212132,210312, 210231 and 211123 corresponding code value sections are respectively 0.8238,0.7592,0.7576 and 0.7923, due to 0.7592 He 0.7576 in the coding section [0.74,0.76] of the prefix character string of fuzzy query, and 0.8238 and 0.7923 does not look into fuzzy In the coding section [0.74,0.76] of the prefix character string of inquiry, therefore, 210312 and 210231 meet fuzzy query, and 212132 It is unsatisfactory for fuzzy query with 211123.
Therefore, the embodiment of the present invention in addition fuzzy query operation in, when index character number be more than 2 when, you can counteracting sentence Operation needed for disconnected deciding field.
The method that the data processing of the embodiment of the present invention is described in detail above in conjunction with Fig. 1 to Fig. 6, with reference to Fig. 7 To the specific example of Figure 11, it is described more fully the embodiment of the present invention.It should be noted that the example of Fig. 7 to Figure 11 is used for the purpose of Help skilled in the art to understand the embodiment of the present invention, and concrete numerical value illustrated by the embodiment of the present invention have to being limited to or Concrete scene.Those skilled in the art are according to the example of given Fig. 7 to Figure 11, it is clear that can carry out the modification of various equivalences Or variation, such modification or variation are also fallen into the range of the embodiment of the present invention.
Fig. 7 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.In method in Fig. 7 It shows and assesses whether to include using the process of arithmetic coding, method as shown in Figure 7 according to income:
710, probabilistic model estimation.
720, list entries.
Specifically, which can be the character string sequence of source data, for example, can be number and alphabetical specific combination Data type.Or numeric only or only alphabetical data type, for example, the sequence of input can be data sheet number, book index Number, public transport license plate number, product document number, IMSI or IMEI etc., and corresponding phonetic symbol of Chinese, Korean, Japanese etc..
730 coding sections.
Specifically, according to Arithmetic Coding algorithm, the coding section of source data is determined.
740, code value is chosen.
Whether specifically, it is determined that whether there is code value corresponding with data in encoding section, in other words, it is determined can be with Code value is chosen, step 750 is if it is carried out, 790 is carried out if it can not choose code value.
750, Profit Assessment.
Specifically, it when there is code value corresponding with data in code value section, determines whether the bit number of code value meets and wants It asks, for example, determining whether the bit number of code value is less than the bit number of data.
760, if meet the requirements.
Step 770 is carried out when meeting the requirements, and otherwise, step 780 is carried out when being unsatisfactory for requiring.
770, application.
Specifically, the operations such as equivalent comparison, sequence and fuzzy query can be carried out according to code value.
780, it abandons.
Specifically, arithmetic coding is abandoned.
790, it abandons.
Specifically, arithmetic coding is abandoned.
Specifically, the embodiment of the present invention can determine whether suitable code can be got in last obtained coding section Value, if can not get direct storage data without arithmetic coding.When code value can be got, will determine that indicate code Whether value can cause required number of bits more than the required number of bits of former data before indicating, if will without compression income It abandons using arithmetic coding.
Therefore, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code When being worth in section in the presence of code value corresponding with data, determine that the bit number of code value is less than the bit number of data, and store code value, this Inventive embodiments carry out income judgement, can reduce the coding of mistake, reduce the memory space of data.
Fig. 8 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.In method in Fig. 8 Show that the process of arithmetic coding and arithmetic decoding based on the coding re-spread exhibition in section, method as shown in Figure 8 include:
810, input source sequence.
Specifically, which can be the character string sequence of source data, for example, can be number and alphabetical specific combination Data type.Or numeric only or only alphabetical data type, for example, the sequence of input can be data sheet number, book index Number, public transport license plate number, product document number, IMSI or IMEI etc., and corresponding phonetic symbol of Chinese, Korean, Japanese etc..
820, source model.
Specifically, source model includes the probability value of each character.The equipment of data processing can be according to source model to source Sequence carries out arithmetic coding.
830, arithmetic coding.
Specifically, the character string of data is gradually encoded according to source model using Arithmetic Coding algorithm.
840, if meet preset condition.
Specifically, judge whether the character number when the data that coding is completed reaches preset characters number threshold value or number According to coding section length whether be less than preset threshold value.It is no to then follow the steps 840 if it is thening follow the steps 850.
850, encode the re-spread exhibition in section
When meeting preset condition, re-spread exhibition is carried out to the coding section of data;According to the coding section after re-spread exhibition, adopt Continue to encode data with Arithmetic Coding algorithm, obtains code value section.Finally obtain the code value of source sequence.
For example, current data includes 12 character strings, predetermined threshold value 0.05, then in use Arithmetic Coding algorithm to working as When 12 character strings of preceding data are encoded, when encoding section less than 0.05, re-spread exhibition is carried out to the coding section of data; For example, when the coding section that the coding section of the 7th character is the 0.1, the 8th character is 0.04, then, to the volume of 8 characters Code section 0.04 carries out re-spread exhibition, for example, be extended to 1 or 10 etc. by 0.04, later, according to after re-spread exhibition coding section (1 or 10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtaining code value section, (the 12nd character corresponds to Coding section).
For another example current data includes 12 character strings, predetermined threshold value 0.05, then using Arithmetic Coding algorithm pair When 12 character strings of current data are encoded, when encoding section less than 0.05, the coding section of data is carried out re-spread Exhibition;For example, be 0.04 when the coding section of the 7th character is the coding section of the 0.1, the 8th character, then, to 8 characters Coding section 0.04 carries out re-spread exhibition, for example, 1 or 10 etc. are extended to by 0.04, later, according to the coding section (1 after re-spread exhibition Or 10) continue to encode data (the 9th to 12 character) using Arithmetic Coding algorithm, obtain code value section (the 12nd character pair The coding section answered).
860, arithmetic decoding.
Specifically, according to source model, and the information of coding, for example, if having carried out interval extension in decoding, equally Opposite operation is carried out in decoding, carries out the scaling in section.
It should be noted that decoder will be transmitted same to by carrying out all information of interval extension, i.e. encoder is by binary code value While sending decoder to, also displacement scheme information can be sent to decoder.For synchronizing information, ensure when decoded Time obtains correct decoding result.
870, source model.
Corresponding with the source model in 820, specifically, source model includes the probability value of each character.Decoding data is set It is standby arithmetic decoding to be carried out to source sequence according to source model.
880, obtain decoding sequence.
Specifically, decoding sequence can be identical as source sequence.
With the increase of the string length of data, encoding obtained coding siding-to-siding block length also can be smaller and smaller, compared with It is not easy to get suitable code value in small coding siding-to-siding block length.It therefore, can not be correct in order to avoid being likely to occur as far as possible The problem of carrying out arithmetic coding, can be easy to get suitable code value, avoid the coding of mistake, realize correctly coding.In addition, The re-spread exhibition that the embodiment of the present invention carries out section can realize the space using limited digit to indicate sufficiently long character string number According to.
Fig. 9 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.In method in Fig. 9 Show that the arithmetic coding of ID class character strings and the process of inquiry operation, method as shown in Figure 9 include:
910, obtain ID class character strings.
Specifically, ID classes character string can be the general of the words letters such as data sheet number, book index number, public transport license plate number appearance Rate is relatively low, and the higher character string of probability that number occurs.For example usually letter only occupies 1 or 2 in character string Position.
920, Summary for Design model.
Specifically, outline model includes the probability of each character.
930, arithmetic coding.
Specifically, the character string of data is gradually encoded according to outline model using Arithmetic Coding algorithm.
940, assessment judges.
Whether specifically, it is determined that whether there is code value corresponding with data in encoding section, in other words, it is determined can be with Code value is chosen, if it is, determining whether the bit number of code value meets the requirements, for example, determining whether the bit number of code value is less than The bit number of data.
950, if meet the requirements.
If met the requirements, 970 are carried out, otherwise, carries out step 960.
960, it abandons.
970, obtain code value.
Specifically, code value corresponding with ID class character strings is determined from code value section.
980, equivalence compares and/or sequence.
Specifically, the equivalent comparison that data can be carried out according to code value operates, when code value is equal with code value to be compared, Determine that data data corresponding with code value to be compared are identical data.Specifically, the equivalent comparison for carrying out code value is appreciated that To carry out the matching of data, for example, when two code values are equal, it may be determined that the corresponding data of two code values are identical number According to namely successful match;When two code values are unequal, it may be determined that the corresponding data of two code values are different data, It matches unsuccessful.
The sequence that data can also be carried out according to code value determines code value in code value to be arranged for example, according to the size of code value Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.Specifically, into line number According to sequence can be understood as being ranked up multiple data.For example, there is 5 data, corresponding 5 code values, 5 code values by it is small to Big sequence, for example, current code value is the 4th in 5 code values, then the 4th corresponding data of code value come in 5 data The 4th.
Figure 10 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method in Figure 10 In show field alphabetic character string arithmetic coding and field sequence process, method as shown in Figure 10 includes:
1010, obtain field.
Specifically, which can be Chinese, Korean, Japanese field etc., and the embodiment of the present invention is not limited to this, the field Can also be other fields that alphabetic character string is converted by phonetic symbol.For example, the field can be that Chinese Fields " outstanding " are " good It is good " and " qualifying ".
1020, alphabetic character string.
Specifically, field is converted into alphabetic character string.For example, " outstanding ", " good " and " qualifying " corresponding alphabetic word Symbol string is respectively " youxiu ", " lianghao " and " jige ".
1030, phonetic alphabet probability.
Specifically, the probability of each letter is obtained, for example, Chinese phonetic alphabet probability of occurrence distribution (spelling), A (0.107), (0.014) B, C (0.017), D (0.030), E (0.062), F (0.009), G (0.060), H (0.067), I (0.141), (0.023) J, K (0.008), L (0.017), M (0.014), N (0.117), O (0.065), P (0.008), Q (0.013), (0.006) R, S (0.026), T (0.015), U (0.096), V (0.001), W (0.010), X (0.020), Y (0.028), (0.026) Z.
1040, arithmetic coding.
Specifically, it according to above-mentioned phonetic alphabet probability, is encoded using Arithmetic Coding algorithm.
1050, assessment judges.
Whether specifically, it is determined that whether there is code value corresponding with data in encoding section, in other words, it is determined can be with Code value is chosen, if it is, determining whether the bit number of code value meets the requirements, for example, determining whether the bit number of code value is less than The bit number of data.
1060, if meet the requirements.
If met the requirements, 1080 are carried out, otherwise, carries out step 1070.
1070, it returns.
1080, obtain code value.
Specifically, code value corresponding with alphabetic character string is determined from code value section.
1090, field sequence.
For example, for field " outstanding " " good " " qualifying ", arrange in alphabetical order and encode, wherein is " outstanding " corresponding Alphabetic character string be " youxiu ", corresponding code value be 0.96684845;" good " corresponding alphabetic character string is " lianghao ", corresponding code value are 0.544375656;" qualifying " corresponding alphabetic character string is " jige ", corresponding code value It is 0.516228.According to code value it is ascending be ordered as 0.516228,0.544375656 and 0.96684845 successively, respectively Corresponding " qualifying ", " good " and " outstanding ".
Figure 11 is the schematic flow chart of the method for data processing according to another embodiment of the present invention.Method in Figure 11 In show that the process of the fuzzy query based on arithmetic coding, method as shown in figure 11 include:
1110, Selecting Index segment.
Specifically, the prefix character string segment for needing fuzzy query is obtained.For example, the index segment is " 210 ".
1120, arithmetic coding.
Specifically, index segment is encoded according to Arithmetic Coding algorithm.
1130, code value section.
Specifically, the code value section of index segment is obtained.For example, set of integers space { 0,1,2,3 }, probability of occurrence distribution {0.2,0.5,0.2,0.1}.For " 210 ", the code value section of " 210 " is [0.74,0.76].
1140, obtain the corresponding code value of sequence.
Specifically, it obtains and needs the corresponding code value of the sequence of fuzzy query, for example, 212132,210312,210231 and 211123 corresponding code value sections are respectively 0.8238,0.7592,0.7576 and 0.7923.
1150, it examines and records.
Specifically, fuzzy query is carried out according to the code value section of the corresponding code value of sequence and index segment and records result. For example, 210 " code value section is [0.74,0.76], the corresponding code value section point in 212132,210312,210231 and 211123 Not Wei 0.8238,0.7592,0.7576 and 0.7923, due to 0.7592 and 0.7576 the prefix character string of fuzzy query volume In code section [0.74,0.76], 0.8238 and 0.7923 not the coding section of the prefix character string of fuzzy query [0.74, 0.76] in, therefore, 210312 and 210231 meet fuzzy query conditions, and 212132 and 211123 are unsatisfactory for fuzzy query conditions.
1160, if terminate.
Specifically, it if it is, carrying out step 870, otherwise carries out step 840 and obtains the corresponding code word of another sequence.
1170, export result.
It should be noted that the example of Fig. 7 to Figure 11, which is to help those skilled in the art, more fully understands the embodiment of the present invention, And the range of the embodiment of the present invention have to be limited.Those skilled in the art are according to the example of given Fig. 7 to Figure 11, it is clear that can To carry out the modification or variation of various equivalences, such modification or variation are also fallen into the range of the embodiment of the present invention.
It should be understood that size of the sequence numbers of the above procedures is not meant that the order of the execution order, the execution of each process is suitable Sequence should be determined by its function and internal logic, and the implementation process of the embodiments of the invention shall not be constituted with any limitation.
Above, the method that the data processing of the embodiment of the present invention is described in detail in conjunction with Fig. 1 to Figure 11, below in conjunction with The equipment that Figure 12 to Figure 19 describes the data processing of the embodiment of the present invention.
Figure 12 is the schematic block diagram of the equipment of data processing according to an embodiment of the invention.Data as shown in figure 12 The equipment 1200 of processing can be encoding device, and the equipment 1200 of data processing as shown in figure 12 includes:Coding unit 1210, Acquiring unit 1220, comparing unit 1230 and the first storage unit 1240.
Specifically, coding unit 1210 obtain code value section for being encoded to data using Arithmetic Coding algorithm; When for there is code value corresponding with data in code value section, code value is obtained according to code value section for acquiring unit 1220;Compare Unit 1230, for being compared the bit number of the bit number of code value and data, to obtain comparison result;First storage is single Member 1240, for carrying out storage operation according to comparison result.
Therefore, the embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code When being worth in section in the presence of code value corresponding with data, code value is obtained according to code value section;By the ratio of the bit number of code value and data Special number is compared, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce mistake Coding accidentally, reduces the memory space of data.
Optionally, as another embodiment, comparison result is that the bit number of code value is less than the bit number of data, wherein the One storage unit stores code value according to comparison result.
Alternatively, as another embodiment, comparison result is that the bit number of code value is greater than or equal to the bit number of data, Wherein, the first storage unit stores data according to comparison result.
Optionally, as another embodiment, which further includes:Applying unit, for carrying out answering for data according to code value With operation, application operating includes at least one of equivalent comparison, arrangement and fuzzy query.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, applying unit The equivalent of data is carried out according to code value to compare.
Specifically, as another embodiment, applying unit determines data and waits for when code value is equal with code value to be compared The corresponding data of code value compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating Including arrangement, applying unit carries out the arrangement of data according to code value.
Specifically, as another embodiment, applying unit determines code value in code value to be arranged according to the size of code value Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, applying unit according to code value to data into Row fuzzy query.
Specifically, as another embodiment, applying unit according to code value whether required fuzzy query prefix character string Coding section in, determine data whether include prefix character string, wherein code value required fuzzy query prefix character string Coding section in when, data include prefix character string, in code value not in the code area of the prefix character string of required fuzzy query Between it is middle when, data include prefix character string.
Optionally, as another embodiment, coding unit 1210 encodes data using Arithmetic Coding algorithm, obtains Encode section;Re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition;According to the coding after re-spread exhibition Section continues to encode data, obtains code value section using Arithmetic Coding algorithm.
Further, coding unit 1210 is when meeting at least one of the following conditions, to the coding sections of data into The re-spread exhibition of row obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character position:The word of the data of coding is completed The length that symbol number reaches the coding section of character number threshold value and data is less than interval threshold.
Optionally, as another embodiment, which further includes:Determination unit, for obtaining code in acquiring unit 1220 Before value, determines and whether there is suitable code value corresponding with data in code value section.
Optionally, as another embodiment, which further includes:Second storage unit, for not deposited in code value section In suitable code value corresponding with data, data are stored.
It should be understood that the equipment of data processing shown in Figure 12 is corresponding with the method for data processing shown in FIG. 1, Tu12Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 1, data processing shown in Figure 12 The function of equipment can be found in the associated description of the method for Fig. 1 data processings, and to avoid repeating, details are not described herein again.
Figure 13 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in fig. 13 that The equipment 1300 of processing can be encoding device, and equipment 1300 includes as shown in fig. 13 that:First coding unit 1310, extension are single First 1320, second coding unit 1330, acquiring unit 1340 and storage unit 1350.
Specifically, the first coding unit 1310 obtains code area for being encoded to data using Arithmetic Coding algorithm Between;Expanding element 1320 is used to carry out re-spread exhibition to the coding section of data, obtains the coding section after re-spread exhibition;Second coding Unit 1330 is used to, according to the coding section after re-spread exhibition, continue to encode data using Arithmetic Coding algorithm, obtain code It is worth section.Acquiring unit 1340 is used to obtain code value according to code value section;Storage unit 1350 is for storing code value.
Therefore, the embodiment of the present invention carries out the re-spread exhibition in coding section by the coding section to data, due to code area Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
Optionally, as another embodiment, expanding element 1320 is when meeting at least one of the following conditions, to data Coding section carry out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:Volume is completed The length that the character number of the data of code reaches the coding section of character number threshold value and data is less than interval threshold.
It should be understood that the method for the equipment data processing as shown in fig. 4 of data processing shown in Figure 13 is corresponding, Tu13Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 4, data processing shown in Figure 13 The function of equipment can be found in the associated description of the method for Fig. 4 data processings, and to avoid repeating, details are not described herein again.
Figure 14 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 14 The equipment 1400 of processing can be decoding device, and equipment 1400 as shown in figure 14 includes:First acquisition unit 1410, first solves Code unit 1420, unit for scaling 1430 and the second decoding unit 1440.
Specifically, first acquisition unit 1410 is for obtaining the code value of data and re-spread exhibition character position;First decoding is single Member 1420 obtains decoding section for being decoded to the code value of data using Arithmetic Coding algorithm;Unit for scaling 1430 is used for According to re-spread exhibition character position, the decoding section of data is scaled again, the decoding section scaled again;Second decoding is single Member 1440 using Arithmetic Coding algorithm for according to the decoding section scaled again, continuing to be decoded data, obtaining data.
Therefore, the embodiment of the present invention has carried out the contracting again in decoding code section by the code value of the re-spread exhibition to encoding section It puts, avoids the decoding of mistake, realize correctly decoding.
Optionally, as another embodiment, unit for scaling 1430 determines weight scale characters position according to re-spread exhibition character position It sets, wherein re-spread exhibition character position is mutually inverted with weight scale characters position;According to weight scale characters position to the area decoder of data Between scaled again, the decoding section scaled again.
It should be understood that the equipment of data processing shown in Figure 14 is corresponding with the method for data processing shown in fig. 5, Tu14Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 5, data processing shown in Figure 14 The function of equipment can be found in the associated description of the method for Fig. 5 data processings, and to avoid repeating, details are not described herein again.
Figure 15 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 15 The equipment 1500 of processing can be encoding device, and equipment 1500 as shown in figure 15 includes:Coding unit 1510, acquiring unit 1520, storage unit 1530 and applying unit 1540.
Specifically, coding unit 1510 obtains code value section for being encoded to data using Arithmetic Coding algorithm;It obtains Take unit 1520 for obtaining code value according to code value section;Storage unit 1530 is for storing code value;Applying unit 1540 is used for Carry out the application operating of data according to code value, application operating include it is equivalent relatively, at least one of arrangement and fuzzy query.
Therefore, the embodiment of the present invention is to data encoding by obtaining code value, and according to code value carry out data it is equivalent relatively, The application at least one of fuzzy query is arranged, is different from the existing equivalence for carrying out data according to source data and compares, arranges Row and fuzzy query, by script complex data using the equivalent carry out respective handling at using code value, quickly and easily.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, applying unit 1540 carry out the equivalent of data according to code value compares;
Specifically, as another embodiment, applying unit 1540 determines data when code value is equal with code value to be compared Data corresponding with code value to be compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating Including arrangement, applying unit 1540 carries out the arrangement of data according to code value.
Specifically, as another embodiment, applying unit 1540 determines code value in code value to be arranged according to the size of code value In position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, and applying unit 1540 is according to code value logarithm According to progress fuzzy query.
Specifically, as another embodiment, whether applying unit 1540 is according to code value in the preceding asyllabia of required fuzzy query In the coding section for according with string, determine whether data include prefix character string, wherein in code value in the preceding asyllabia of required fuzzy query When according in the coding section of string, data include prefix character string, in code value not in the volume of the prefix character string of required fuzzy query When in code section, data do not include prefix character string.
It should be understood that the equipment of data processing shown in figure 15 is corresponding with the method for data processing shown in fig. 6, Tu15Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 6, data processing shown in figure 15 The function of equipment can be found in the associated description of the method for Fig. 6 data processings, and to avoid repeating, details are not described herein again.
Figure 16 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 16 The equipment 1600 of processing can be encoding device, and the equipment 1600 of data processing as shown in figure 16 includes:Including processor 1610, memory 1620 and bus system 1630.
Specifically, processor 1610 calls the code being stored in memory 1620 by bus system 1630, using calculation Art encryption algorithm encodes data, obtains code value section;When there is code value corresponding with data in code value section, according to Code value section obtains code value;By being compared for the bit number of code value and the bit number of data, to obtain comparison result;According to than Relatively result carries out storage operation.
The embodiment of the present invention encodes data by using Arithmetic Coding algorithm, obtains code value section.In code value area When interior presence code value corresponding with data, code value is obtained according to code value section;By the bit number of the bit number of code value and data Be compared, to obtain comparison result;Storage operation is carried out according to comparison result.The embodiment of the present invention can reduce mistake Coding, reduces the memory space of data.
The method that the embodiments of the present invention disclose can be applied in processor 1610, or real by processor 1610 It is existing.Processor 1610 may be a kind of IC chip, the processing capacity with signal.During realization, the above method Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1610 or software form.Above-mentioned Processor 1610 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe In storage medium.The storage medium is located at memory 1620, and processor 1610 reads the information in memory 1620, hard in conjunction with it Part completes the step of above method, which can also include power bus, control in addition to including data/address bus Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1630 in figure.
Optionally, as another embodiment, comparison result is that the bit number of code value is less than the bit number of data, wherein place Device 1610 is managed according to comparison result, stores code value.
Alternatively, as another embodiment, comparison result is that the bit number of code value is greater than or equal to the bit number of data, Wherein, processor 1610 stores data according to comparison result.
Optionally, as another embodiment, which further includes:Processor 1610 is used to carry out answering for data according to code value With operation, application operating includes at least one of equivalent comparison, arrangement and fuzzy query.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, processor 1610 carry out the equivalent of data according to code value compares.
Specifically, as another embodiment, processor 1610 when code value is equal with code value to be compared, determine data with The corresponding data of code value to be compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating Including arrangement, processor 1610 carries out the arrangement of data according to code value.
Specifically, as another embodiment, processor 1610 determines code value in code value to be arranged according to the size of code value Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, and processor 1610 is according to code value to data Carry out fuzzy query.
Specifically, as another embodiment, processor 1610 according to code value whether required fuzzy query prefix character In the coding section of string determine data whether include prefix character string, wherein code value required fuzzy query prefix character When in the coding section of string, data include prefix character string, in code value not in the coding of the prefix character string of required fuzzy query When in section, data do not include prefix character string.
Optionally, as another embodiment, processor 1610 encodes data using Arithmetic Coding algorithm, is compiled Code section;Re-spread exhibition is carried out to the coding section of data, obtains the coding section after re-spread exhibition;According to the code area after re-spread exhibition Between, continue to encode data using Arithmetic Coding algorithm, obtains code value section.
Further, processor 1610 carries out the coding section of data when meeting at least one of the following conditions Re-spread exhibition obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition character position:The character of the data of coding is completed The length that number reaches the coding section of character number threshold value and data is less than interval threshold.
Optionally, as another embodiment, before processor 1610 obtains code value, determine whether there is in code value the section in The corresponding suitable code value of data.
Optionally, as another embodiment, there is no corresponding with data suitable in code value section for processor 1610 When code value, data are stored.
It should be understood that the equipment of data processing shown in Figure 16 is corresponding with the method for data processing shown in FIG. 1, Tu16Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 1, data processing shown in Figure 16 The function of equipment can be found in the associated description of the method for Fig. 1 data processings, and to avoid repeating, details are not described herein again.
Figure 17 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 17 The equipment 1700 of processing can be encoding device, and equipment 1700 as shown in figure 17 includes:Including processor 1710, memory 1720 and bus system 1730.
Specifically, processor 1710 calls the code being stored in memory 1720 by bus system 1730, using calculation Art encryption algorithm encodes data, obtains coding section;Re-spread exhibition is carried out to the coding section of data, after obtaining re-spread exhibition Coding section;According to the coding section after re-spread exhibition, continues to encode data using Arithmetic Coding algorithm, obtain code value Section.Code value is obtained according to code value section;Store code value.
Therefore, the embodiment of the present invention carries out the re-spread exhibition in coding section by the coding section to data, due to code area Between carried out re-spread exhibition so that code value section similarly expands, therefore the embodiment of the present invention can be in widened code value section In be easier to get suitable code value, avoid mistake coding, realize correctly coding.In addition, the embodiment of the present invention carries out area Between re-spread exhibition can realize the space using limited digit to indicate sufficiently long string data.
The method that the embodiments of the present invention disclose can be applied in processor 1710, or real by processor 1710 It is existing.Processor 1710 may be a kind of IC chip, the processing capacity with signal.During realization, the above method Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1710 or software form.Above-mentioned Processor 1710 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe In storage medium.The storage medium is located at memory 1720, and processor 1710 reads the information in memory 1720, hard in conjunction with it Part completes the step of above method, which can also include power bus, control in addition to including data/address bus Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1730 in figure.
Optionally, as another embodiment, processor 1710 is when meeting at least one of the following conditions, to data It encodes section and carries out re-spread exhibition, obtain the coding section after re-spread exhibition, and record corresponding re-spread exhibition character position:Coding is completed The length in the character number coding section that reaches character number threshold value and data of data be less than interval threshold.
It should be understood that the method for the equipment data processing as shown in fig. 4 of data processing shown in Figure 17 is corresponding, Tu17Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 4, data processing shown in Figure 17 The function of equipment can be found in the associated description of the method for Fig. 4 data processings, and to avoid repeating, details are not described herein again.
Figure 18 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 18 The equipment 1800 of processing can be decoding device, and equipment 1800 as shown in figure 18 includes:Including processor 1810, memory 1820 and bus system 1830.
Specifically, processor 1810 calls the code being stored in memory 1820 by bus system 1830, obtains number According to code value and re-spread exhibition character position;The code value of data is decoded using Arithmetic Coding algorithm, obtains decoding section;Root According to re-spread exhibition character position, the decoding section of data is scaled again, the decoding section scaled again;According to what is scaled again Section is decoded, continues to be decoded data using Arithmetic Coding algorithm, obtains data.
Therefore, the embodiment of the present invention has carried out the contracting again in decoding code section by the code value of the re-spread exhibition to encoding section It puts, avoids the decoding of mistake, realize correctly decoding.
The method that the embodiments of the present invention disclose can be applied in processor 1810, or real by processor 1810 It is existing.Processor 1810 may be a kind of IC chip, the processing capacity with signal.During realization, the above method Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1810 or software form.Above-mentioned Processor 1810 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe In storage medium.The storage medium is located at memory 1820, and processor 1810 reads the information in memory 1820, hard in conjunction with it Part completes the step of above method, which can also include power bus, control in addition to including data/address bus Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1830 in figure.
Optionally, as another embodiment, processor 1810 determines weight scale characters position according to re-spread exhibition character position It sets, wherein re-spread exhibition character position is mutually inverted with weight scale characters position;According to weight scale characters position to the area decoder of data Between scaled again, the decoding section scaled again.
It should be understood that the equipment of data processing shown in Figure 18 is corresponding with the method for data processing shown in fig. 5, Tu18Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 5, data processing shown in Figure 18 The function of equipment can be found in the associated description of the method for Fig. 5 data processings, and to avoid repeating, details are not described herein again.
Figure 19 is the schematic block diagram of the equipment of data processing according to another embodiment of the present invention.Data as shown in figure 19 The equipment 1900 of processing can be encoding device, and equipment 1900 as shown in figure 19 includes:Including processor 1910, memory 1920 and bus system 1930.
Specifically, processor 1910 calls the code being stored in memory 1920 by bus system 1930, using calculation Art encryption algorithm encodes data, obtains code value section;Code value is obtained according to code value section;Store code value;According to code value Carry out the application operating of data, application operating include it is equivalent relatively, at least one of arrangement and fuzzy query.
Therefore, the embodiment of the present invention is to data encoding by obtaining code value, and according to code value carry out data it is equivalent relatively, The application at least one of fuzzy query is arranged, is different from the existing equivalence for carrying out data according to source data and compares, arranges Row and fuzzy query, by script complex data using the equivalent carry out respective handling at using code value, quickly and easily.
The method that the embodiments of the present invention disclose can be applied in processor 1910, or real by processor 1910 It is existing.Processor 1910 may be a kind of IC chip, the processing capacity with signal.During realization, the above method Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1910 or software form.Above-mentioned Processor 1910 can be general processor, digital signal processor (English Digital Signal Processor, abbreviation DSP), application-specific integrated circuit (English Application Specific Integrated Circuit, abbreviation ASIC), ready-made Programmable gate array (English Field Programmable Gate Array, abbreviation FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the disclosure in the embodiment of the present invention Each method, step and logic diagram.General processor can be microprocessor or the processor can also be any routine Processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processor execution Complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at arbitrary access Memory (English Random Access Memory, abbreviation RAM), flash memory, read-only memory (English Read-Only Memory, abbreviation ROM), this fields such as programmable read only memory or electrically erasable programmable memory, register it is ripe In storage medium.The storage medium is located at memory 1920, and processor 1910 reads the information in memory 1920, hard in conjunction with it Part completes the step of above method, which can also include power bus, control in addition to including data/address bus Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 1930 in figure.
Optionally, as another embodiment, data are ID class character strings, and application operating includes that equivalence compares, processor 1910 carry out the equivalent of data according to code value compares;
Specifically, as another embodiment, processor 1910 when code value is equal with code value to be compared, determine data with The corresponding data of code value to be compared are identical data.
Alternatively, as another embodiment, data be ID classes character string or be field alphabetic character string, application operating Including arrangement, processor 1910 carries out the arrangement of data according to code value.
Specifically, as another embodiment, processor 1910 determines code value in code value to be arranged according to the size of code value Position, the position of code value is for indicating position of the data in data corresponding with code value to be arranged.
Alternatively, as another embodiment, application operating includes fuzzy query, and processor 1910 is according to code value to data Carry out fuzzy query.
Specifically, as another embodiment, processor 1910 according to code value whether required fuzzy query prefix character In the coding section of string, determine data whether include prefix character string, wherein code value required fuzzy query prefix character When in the coding section of string, data include prefix character string, in code value not in the coding of the prefix character string of required fuzzy query When in section, data do not include prefix character string.
It should be understood that the equipment of data processing shown in Figure 19 is corresponding with the method for data processing shown in fig. 6, Tu19Suo The equipment for the data processing shown can realize each process of the method for the data processing of Fig. 6, data processing shown in Figure 19 The function of equipment can be found in the associated description of the method for Fig. 6 data processings, and to avoid repeating, details are not described herein again.
It should be understood that " one embodiment " or " embodiment " that specification is mentioned in the whole text mean it is related with embodiment A particular feature, structure, or characteristic includes at least one embodiment of the present invention.Therefore, occur everywhere in the whole instruction " in one embodiment " or " in one embodiment " not necessarily refer to identical embodiment.In addition, these specific feature, knots Structure or characteristic can in any suitable manner combine in one or more embodiments.It should be understood that in the various implementations of the present invention In example, size of the sequence numbers of the above procedures is not meant that the order of the execution order, and the execution sequence of each process should be with its work( It can determine that the implementation process of the embodiments of the invention shall not be constituted with any limitation with internal logic.
In addition, the terms " system " and " network " are often used interchangeably herein.The terms " and/ Or ", only a kind of incidence relation of description affiliated partner, indicates may exist three kinds of relationships, for example, A and/or B, it can be with table Show:Individualism A exists simultaneously A and B, these three situations of individualism B.In addition, character "/" herein, typicallys represent front and back Affiliated partner is a kind of relationship of "or".
It should be understood that in embodiments of the present invention, " B corresponding with A " indicates that B is associated with A, and B can be determined according to A.But It should also be understood that determining that B is not meant to determine B only according to A according to A, B can also be determined according to A and/or other information.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, depend on the specific application and design constraint of technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is The specific work process of system, device and unit, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component It can be combined or can be integrated into another system, or some features can be ignored or not executed.In addition, shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be the INDIRECT COUPLING by some interfaces, device or unit Or communication connection, can also be electricity, mechanical or other form connections.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the embodiment of the present invention Purpose.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, can also be during two or more units are integrated in one unit.It is above-mentioned integrated The form that hardware had both may be used in unit is realized, can also be realized in the form of SFU software functional unit.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can be with It is realized with hardware realization or firmware realization or combination thereof mode.It when implemented in software, can be by above-mentioned function Storage in computer-readable medium or as on computer-readable medium one or more instructions or code be transmitted.Meter Calculation machine readable medium includes computer storage media and communication media, and wherein communication media includes convenient for from a place to another Any medium of a place transmission computer program.Storage medium can be any usable medium that computer can access.With For this but it is not limited to:Computer-readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk Storage medium or other magnetic storage apparatus or can be used in carry or store with instruction or data structure form expectation Program code and can be by any other medium of computer access.In addition.Any connection appropriate can become computer Readable medium.For example, if software is using coaxial cable, optical fiber cable, twisted-pair feeder, Digital Subscriber Line (DSL) or such as The wireless technology of infrared ray, radio and microwave etc is transmitted from website, server or other remote sources, then coaxial electrical The wireless technology of cable, optical fiber cable, twisted-pair feeder, DSL or such as infrared ray, wireless and microwave etc is included in affiliated medium In fixing.As used in the present invention, disk (Disk) and dish (disc) include compressing optical disc (CD), laser disc, optical disc, number to lead to With optical disc (DVD), floppy disk and Blu-ray Disc, the usually magnetic replicate data of which disk, and dish is then with laser come optical duplication Data.Above combination above should also be as being included within the protection domain of computer-readable medium.
In short, the foregoing is merely the preferred embodiment of technical solution of the present invention, it is not intended to limit the present invention's Protection domain.All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in Within protection scope of the present invention.

Claims (30)

1. a kind of method of data processing, which is characterized in that including:
Data are encoded using Arithmetic Coding algorithm, obtain code value section;
When there is code value corresponding with the data in the code value section, the code value is obtained according to the code value section;
By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;
Storage operation is carried out in the database according to the comparison result.
2. according to the method described in claim 1, it is characterized in that, the comparison result is less than institute for the bit number of the code value The bit number of data is stated,
Wherein, described that storage operation is carried out according to the comparison result in the database, including:
According to the comparison result, the code value is stored in the database.
3. according to the method described in claim 1, it is characterized in that, the comparison result be the code value bit number be more than or Equal to the bit number of the data,
Wherein, described that storage operation is carried out according to the comparison result in the database, including:
According to the comparison result, the data are stored in the database.
4. according to the method described in claim 2, it is characterized in that, further including:
The application operating of the data is carried out according to the code value, the application operating, which compares including equivalence, arranges and obscures, to be looked into At least one of ask.
5. according to the method described in claim 4, it is characterized in that, the data are mark ID class character strings, the application is grasped Work includes that equivalence compares, the application operating that the data are carried out according to the code value, including:
When the code value is equal with code value to be compared, determine that data data corresponding with the code value to be compared are Identical data.
6. according to the method described in claim 4, it is characterized in that, the data be ID classes character string or be field alphabetic word Symbol string, the application operating include arranging, the application operating that the data are carried out according to the code value, including:
According to the size of the code value, determine that position of the code value in code value to be arranged, the position of the code value are used for table Show position of the data in data corresponding with the code value to be arranged.
7. described according to institute according to the method described in claim 4, it is characterized in that, the application operating includes fuzzy query The application operating that code value carries out the data is stated, including:
According to the code value whether in the coding section of the prefix character string of required fuzzy query, determine whether the data wrap The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described Sew character string.
8. method according to any one of claim 1 to 7, which is characterized in that described to use Arithmetic Coding algorithm logarithm According to being encoded, code value section is obtained, including:
The data are encoded using Arithmetic Coding algorithm, obtain coding section;
Re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition;
According to the coding section after the re-spread exhibition, continues to encode the data using Arithmetic Coding algorithm, obtain code It is worth section.
9. according to the method described in claim 8, it is characterized in that, the coding section to the data carries out re-spread exhibition, The coding section after re-spread exhibition is obtained, including:
When meeting at least one of the following conditions, re-spread exhibition is carried out to the coding section of the data, after obtaining re-spread exhibition Coding section, and record corresponding re-spread exhibition character position:
The character number that the data of coding are completed reaches the length for encoding section of character number threshold value and the data Less than interval threshold.
10. method according to any one of claim 1 to 7, which is characterized in that exist and institute in the code value section When stating the corresponding code value of data, before obtaining the code value according to the code value section, further include:
It determines and whether there is suitable code value corresponding with the data in the code value section.
11. according to the method described in claim 10, it is characterized in that, further including:
When suitable code value corresponding with the data being not present in the code value section, the data are stored.
12. a kind of method of data processing, which is characterized in that including:
Data are encoded using Arithmetic Coding algorithm, obtain code value section;
Code value is obtained according to the code value section;
The code value is stored in the database;
The application operating of the data is carried out according to the code value, the application operating, which compares including equivalence, arranges and obscures, to be looked into At least one of ask;
Wherein, described to store the code value in the database, including:
By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;
According to the comparison result, the code value is stored in the database.
13. according to the method for claim 12, which is characterized in that the data are ID class character strings, the application operating Relatively including equivalence, the application operating that the data are carried out according to the code value, including:
When the code value is equal with code value to be compared, determine that data data corresponding with the code value to be compared are Identical data.
14. according to the method for claim 12, which is characterized in that the data be ID classes character string or be field letter Character string, the application operating include arranging, the application operating that the data are carried out according to the code value, including:
According to the size of the code value, determine that position of the code value in code value to be arranged, the position of the code value are used for table Show position of the data in data corresponding with the code value to be arranged.
15. according to the method for claim 12, which is characterized in that the application operating includes fuzzy query, the basis The code value carries out the application operating of the data, including:
According to the code value whether in the coding section of the prefix character string of required fuzzy query, determine whether the data wrap The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described Sew character string.
16. a kind of equipment of data processing, which is characterized in that including:
Coding unit obtains code value section for being encoded to data using Arithmetic Coding algorithm;
Acquiring unit, when for there is code value corresponding with the data in the code value section, according to the code value section Obtain the code value;
Comparing unit compares knot for being compared the bit number of the code value and the bit number of the data to obtain Fruit;
First storage unit, for carrying out storage operation in the database according to the comparison result.
17. equipment according to claim 16, which is characterized in that the comparison result is that the bit number of the code value is less than The bit number of the data,
Wherein, first storage unit stores the code value in the database according to the comparison result.
18. equipment according to claim 16, which is characterized in that the comparison result is that the bit number of the code value is more than Or the bit number equal to the data,
Wherein, first storage unit stores the data in the database according to the comparison result.
19. equipment according to claim 17, which is characterized in that further include:
Applying unit, the application operating for carrying out the data according to the code value, the application operating include it is equivalent relatively, At least one of arrangement and fuzzy query.
20. equipment according to claim 19, which is characterized in that the data are ID class character strings, the application operating Relatively including equivalence, the applying unit determines that the data are waited for described when the code value is equal with code value to be compared The corresponding data of code value compared are identical data.
21. equipment according to claim 19, which is characterized in that the data be ID classes character string or be field letter Character string, the application operating include arrangement, and the applying unit determines the code value in the row of waiting for according to the size of the code value Position in row code value, the position of the code value is for indicating the data in data corresponding with the code value to be arranged Position.
22. equipment according to claim 19, which is characterized in that the application operating includes fuzzy query, the application Whether unit in the coding section of the prefix character string of required fuzzy query determines whether the data wrap according to the code value The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described Sew character string.
23. the equipment according to any one of claim 16 to 22, which is characterized in that the coding unit is compiled using arithmetic Code algorithm encodes the data, obtains coding section;Re-spread exhibition is carried out to the coding section of the data, is obtained re-spread Coding section after exhibition;According to the coding section after the re-spread exhibition, continue to carry out the data using Arithmetic Coding algorithm Coding, obtains code value section.
24. equipment according to claim 23, which is characterized in that the coding unit is worked as and met in the following conditions at least At one, re-spread exhibition is carried out to the coding section of the data, obtains the coding section after re-spread exhibition, and records corresponding re-spread exhibition Character position:
The character number that the data of coding are completed reaches the length for encoding section of character number threshold value and the data Less than interval threshold.
25. the equipment according to any one of claim 16 to 22, which is characterized in that further include:
Determination unit, for before the acquiring unit obtains the code value, determine whether there is in the code value section in The corresponding suitable code value of the data.
26. equipment according to claim 25, which is characterized in that further include:
Second storage unit, when for suitable code value corresponding with the data to be not present in the code value section, storage The data.
27. a kind of equipment of data processing, which is characterized in that including:
Coding unit obtains code value section for being encoded to data using Arithmetic Coding algorithm;
Acquiring unit, for obtaining code value according to the code value section;
Storage unit, for storing the code value in the database;
Applying unit, the application operating for carrying out the data according to the code value, the application operating include it is equivalent relatively, At least one of arrangement and fuzzy query;
Wherein, the storage unit is specifically used for:
By being compared for the bit number of the code value and the bit number of the data, to obtain comparison result;
According to the comparison result, the code value is stored in the database.
28. equipment according to claim 27, which is characterized in that the data are ID class character strings, the application operating Relatively including equivalence, the applying unit determines that the data are waited for described when the code value is equal with code value to be compared The corresponding data of code value compared are identical data.
29. equipment according to claim 27, which is characterized in that the data be ID classes character string or be field letter Character string, the application operating include arrangement, and the applying unit determines the code value in the row of waiting for according to the size of the code value Position in row code value, the position of the code value is for indicating the data in data corresponding with the code value to be arranged Position.
30. equipment according to claim 27, which is characterized in that the application operating includes fuzzy query, the application Unit whether in the coding section of the prefix character string of required fuzzy query, determines whether the data wrap according to the code value The prefix character string is included,
Wherein, when the code value is in the coding section of the prefix character string of required fuzzy query, the data include described Prefix character string,
When the code value is not in the coding section of the prefix character string of required fuzzy query, before the data do not include described Sew character string.
CN201510059809.6A 2015-02-04 2015-02-04 A kind of method and apparatus of data processing Active CN104579360B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201510059809.6A CN104579360B (en) 2015-02-04 2015-02-04 A kind of method and apparatus of data processing
EP16746065.8A EP3244540A4 (en) 2015-02-04 2016-01-13 Data processing method and device
PCT/CN2016/070805 WO2016124070A1 (en) 2015-02-04 2016-01-13 Data processing method and device
US15/668,335 US9998145B2 (en) 2015-02-04 2017-08-03 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510059809.6A CN104579360B (en) 2015-02-04 2015-02-04 A kind of method and apparatus of data processing

Publications (2)

Publication Number Publication Date
CN104579360A CN104579360A (en) 2015-04-29
CN104579360B true CN104579360B (en) 2018-07-31

Family

ID=53094691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510059809.6A Active CN104579360B (en) 2015-02-04 2015-02-04 A kind of method and apparatus of data processing

Country Status (4)

Country Link
US (1) US9998145B2 (en)
EP (1) EP3244540A4 (en)
CN (1) CN104579360B (en)
WO (1) WO2016124070A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104579360B (en) 2015-02-04 2018-07-31 华为技术有限公司 A kind of method and apparatus of data processing
CN106484753B (en) * 2016-06-07 2020-01-03 湖南千年华光软件开发有限公司 Data processing method
CN110326253B (en) * 2016-12-30 2021-11-09 罗伯特·博世有限公司 Method and system for fuzzy keyword search of encrypted data
CN112422491A (en) * 2020-05-08 2021-02-26 上海幻电信息科技有限公司 Encryption and decryption method for digital codes, server and storage medium
CN111968379B (en) * 2020-08-10 2021-08-31 中化信息技术有限公司 Method, device, terminal and computer readable medium for entering license plate number
CN112181869A (en) * 2020-09-11 2021-01-05 中国银联股份有限公司 Information storage method, device, server and medium
CN112486976A (en) * 2020-12-18 2021-03-12 咪咕文化科技有限公司 Data processing method, device, network equipment and storage medium
CN112565776B (en) * 2021-02-25 2021-07-20 北京城建设计发展集团股份有限公司 Video transcoding compression method and system
CN116719476B (en) * 2023-05-26 2024-01-02 广州市玄武无线科技股份有限公司 Compressed storage method and device for mobile phone numbers, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0793349A2 (en) * 1996-02-29 1997-09-03 Gandalf Technologies Inc. Method and apparatus for performing data compression
CN1167951A (en) * 1996-01-31 1997-12-17 株式会社日立制作所 Method of and apparatus for compressing and expanding data and data processing apparatus and network system using same
CN101031086A (en) * 2002-10-10 2007-09-05 索尼株式会社 Video-information encoding method and video-information decoding method
CN102799590A (en) * 2011-05-26 2012-11-28 安凯(广州)微电子技术有限公司 Embedded type electronic product word stock as well as word stock generating method and word stock searching method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3302210B2 (en) * 1995-02-10 2002-07-15 富士通株式会社 Data encoding / decoding method and apparatus
US6100824A (en) * 1998-04-06 2000-08-08 National Dispatch Center, Inc. System and method for data compression
WO2007065352A1 (en) * 2005-12-05 2007-06-14 Huawei Technologies Co., Ltd. Method and apparatus for realizing arithmetic coding/ decoding
JP4555257B2 (en) * 2006-06-06 2010-09-29 パナソニック株式会社 Image encoding device
CN104579360B (en) * 2015-02-04 2018-07-31 华为技术有限公司 A kind of method and apparatus of data processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1167951A (en) * 1996-01-31 1997-12-17 株式会社日立制作所 Method of and apparatus for compressing and expanding data and data processing apparatus and network system using same
EP0793349A2 (en) * 1996-02-29 1997-09-03 Gandalf Technologies Inc. Method and apparatus for performing data compression
CN101031086A (en) * 2002-10-10 2007-09-05 索尼株式会社 Video-information encoding method and video-information decoding method
CN102799590A (en) * 2011-05-26 2012-11-28 安凯(广州)微电子技术有限公司 Embedded type electronic product word stock as well as word stock generating method and word stock searching method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
H.264中基于内容匹配的自适应二进制算术编码;石增硕;《中国优秀硕士学位论文全文数据库 信息科技辑》;20070615(第6期);第I136-347页 *
一种改进的通过查表实现的算术编解码方法;张文妮 等;《复旦学报(自然科学版)》;20060228;第45卷(第1期);第45-48页 *
基于自适应算术编码的字符型报文压缩技术;李玮 等;《科学技术与工程》;20130430;第13卷(第10期);第2836-2840页 *

Also Published As

Publication number Publication date
US9998145B2 (en) 2018-06-12
EP3244540A1 (en) 2017-11-15
EP3244540A4 (en) 2018-01-31
CN104579360A (en) 2015-04-29
US20170331492A1 (en) 2017-11-16
WO2016124070A1 (en) 2016-08-11

Similar Documents

Publication Publication Date Title
CN104579360B (en) A kind of method and apparatus of data processing
CN105684316B (en) Polar code encoding method and device
CN108388598B (en) Electronic device, data storage method, and storage medium
US8265407B2 (en) Method for coding and decoding 3D data implemented as a mesh model
US11178212B2 (en) Compressing and transmitting structured information
CN101350858B (en) Method for decoding short message and user terminal
CN116506073B (en) Industrial computer platform data rapid transmission method and system
US8838550B1 (en) Readable text-based compression of resource identifiers
JP5656593B2 (en) Apparatus and method for decoding encoded data
CN104657481A (en) Data storage method and device and data query method and device
CN104572994B (en) Method and apparatus for searching for data
CN103051480B (en) The storage means of a kind of DN and DN storage device
CN110266834B (en) Area searching method and device based on internet protocol address
CN109831544B (en) Code storage method and system applied to email address
CN112287638A (en) Digital display method and device
CN113742332A (en) Data storage method, device, equipment and storage medium
CN114500670B (en) Encoding compression method, decoding method and device
US8976048B2 (en) Efficient processing of Huffman encoded data
CN110377822A (en) Method, apparatus and electronic equipment for network characterisation study
CN107832341B (en) AGNSS user duplicate removal statistical method
CN106533450B (en) PMS code compression method and device
CN112232025B (en) Character string storage method and device and electronic equipment
CN110287147B (en) Character string sorting method and device
CN109660262A (en) A kind of character coding method and system applied to E-mail address
CN115001628B (en) Data encoding method and device, data decoding method and device and data structure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant