CN109831544A - A kind of coding and storing method and system applied to E-mail address - Google Patents

A kind of coding and storing method and system applied to E-mail address Download PDF

Info

Publication number
CN109831544A
CN109831544A CN201910091867.5A CN201910091867A CN109831544A CN 109831544 A CN109831544 A CN 109831544A CN 201910091867 A CN201910091867 A CN 201910091867A CN 109831544 A CN109831544 A CN 109831544A
Authority
CN
China
Prior art keywords
coding
mail address
character
node
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910091867.5A
Other languages
Chinese (zh)
Other versions
CN109831544B (en
Inventor
谢文辉
王敏
刘江桥
张�浩
汪翔
杨柳
周期律
常学亮
张轶
孙光辉
秦邱川
刘远奎
刘引
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Rural Commercial Bank Co Ltd
Original Assignee
Chongqing Rural Commercial Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Rural Commercial Bank Co Ltd filed Critical Chongqing Rural Commercial Bank Co Ltd
Priority to CN201910091867.5A priority Critical patent/CN109831544B/en
Publication of CN109831544A publication Critical patent/CN109831544A/en
Application granted granted Critical
Publication of CN109831544B publication Critical patent/CN109831544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of coding and storing methods applied to E-mail address, follow the steps below: splitting;Coding;Compression.The present invention solves inefficient in the storing process in the desensitization method of the existing technology to E-mail address after character code, the problem of expending space, a kind of coding and storing method and system applied to E-mail address is provided, since huffman coding table is the coding schedule that obtains according to actual production data when it is applied, therefore the binary sized obtained after the mailbox coding in entire library can be saved than luv space using the coding schedule, simultaneously by the characteristic of huffman coding, special storage mode of the invention can guarantee that binary coding finally obtained in this way is one-to-one relationship for original character string, and this storage mode only needs 3 binary digits (bit) to each character string, efficiency is higher.

Description

A kind of coding and storing method and system applied to E-mail address
Technical field
The present invention relates to code storage fields, and in particular to a kind of coding and storing method applied to E-mail address and System.
Background technique
Due in the Database Systems of bank, there being the privacy-sensitive information of a large amount of individual.And in the daily work of bank In work, and the moment needs various data to carry out the work, leaking data it is very risky, so needing to add sensitive information Work processing, hides privacy-sensitive data.
E-mail address is an important privacy information for personal, can not only be contacted by the mailbox To the owner, and many websites, the app application of mobile phone terminal is all that can be bound by mailbox or even some important accounts Retrieval function can also be reset by mailbox.E-mail address is due to less paying attention at present, the relatively simple master of desensitization method It is divided into following a few classes:
1, symbol Shift Method directly replaces all (or parts) letter with additional character (such as *).
2, displacement method is encoded, the mobile fixed digit of the coding of each letter, such as a displacement are become into b, b displacement becomes c.
For several method there are some disadvantages, the first can effectively hide sensitive information above, but due to its replacement after Multiple email addresses correspond to the address after one and same coding, break the incidence relation of data, and such as two tables of data have identical address Mailbox, implement this method desensitization after, analysis personnel can not know two datas it is corresponding be the same address.Second of holding One-to-one incidence relation, but be easy from known desensitization result to extrapolate the parameter of offset, to obtain former data.
Summary of the invention
The present invention solves the storage in the desensitization method of the existing technology to E-mail address after character code It is inefficient in journey, expend space the problem of, a kind of coding and storing method and system applied to E-mail address is provided, Using when since huffman coding table is the coding schedule that obtains according to actual production data, using the coding schedule to entire The binary sized obtained after the mailbox coding in library can be saved than luv space, while by the characteristic of huffman coding, this hair Bright special storage mode can guarantee that binary coding finally obtained in this way is one-to-one relationship for original character string, And this storage mode only needs 3 binary digits (bit) to each character string, efficiency is higher.
The present invention is achieved through the following technical solutions:
A kind of coding and storing method applied to E-mail address, follows the steps below:
A, it splits, E-mail address is split as the customized part of user according to character and server defines part;
B, it encodes, the customized part of the user of E-mail address is encoded by Huffman coding method;
C, it compresses, the character of the customized part of user of E-mail address is traversed, by coding result obtained in step B It is stored with byte arrays.
Further, a kind of coding and storing method applied to E-mail address, the detailed process of the step A are as follows: According to the@symbol of email address, the character of E-mail address is split as user name and domain name, is user name before@symbol, The i.e. described customized part of user ,@symbol are later domain name, i.e., the described server defines part.
Further, a kind of coding and storing method applied to E-mail address, the step B specifically: statistics electricity The frequency of occurrences of each character of the customized part of user in sub-voice mailbox address, from high to low according to the frequency of occurrences by all characters It sorts and creates sequencing table, huffman coding table is created according to the sequence of the sequencing table, according to the huffman coding table pair The customized part of the user of E-mail address encodes.
Further, a kind of coding and storing method applied to E-mail address, the character include English alphabet, number Word character, fullstop, hyphen and underscore any combination.
Further, a kind of coding and storing method applied to E-mail address is created according to sequencing table in the step B Build the process of huffman coding table specifically:
B1, the frequency of occurrences of character in the user name in E-mail address is obtained by counting or sampling, be electronics postal Totally 39 characters create 39 nodes to English alphabet, numerical character, fullstop, hyphen and the underscore that case address is likely to occur, The frequency values of the frequency of occurrences are corresponded in each node comprising it, and are arranged from small to large by frequency values, node array is stored in In, node array is denoted as N1, N2, N3 ..., N39;
B2, the first two node N1, N2 are removed from node array, the frequency of the first two node is added creation one new section Point P3, and using N1, N2 as two child nodes of P3;
B3, according to the frequency of P3 by being sequentially inserted into origin node array from small to large, wherein the frequency of P3 by N1, N2. frequency addition is got;
B4, repeat the above steps B2, B3, until entire node array is only left the last one node R, R, that is, binary tree Root node, and each leaf node corresponds to 39 characters that E-mail address is likely to occur;
B5,0 then is added in path if it is left child node to each child node from each cotyledon node of R traversal, if it is Right child node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the volume of character representated by the leaf node Code;
B6, the corresponding coding of each character is stored in Hash table T, for subsequent use.
Further, a kind of coding and storing method applied to E-mail address, the step B further include: by electronics The server definition part of email address is encoded by Huffman coding method.
Further, a kind of coding and storing method applied to E-mail address, in the step B by E-mail address The server of location defines the process that part is encoded by Huffman coding method specifically: takes in statistics E-mail address Business device defines the frequency of occurrences of each character of part, and all characters are sorted from high to low according to the frequency of occurrences and create sequence Table creates huffman coding table according to the sequence of the sequencing table, according to the huffman coding table to E-mail address Server definition part is encoded.
Further, a kind of coding and storing method applied to E-mail address, the coding that will be obtained in the step C As a result the detailed process stored with byte arrays are as follows:
The character of the customized part of user of C1, traversal E-mail address, to each of which character code, and uses byte Storage of array, storage format are sequentially written in 0 or 1 by binary digit is left-to-right since the 3rd of first character section, are sequentially written in The coding of all characters in the customized part of the user of E-mail address;
In C2, the byte arrays obtained due to step C1, the last byte is not to be all occupied full completely, with first First three binary digit of byte come indicate the last byte occupy highest order.
A kind of code storage system applied to E-mail address, including split module, character code module, compression mould Block, in which:
Module is split, for E-mail address being split as the customized part of user according to character and server defines portion Point;
Character code module, for passing through the customized part of user for splitting the E-mail address that module is split Huffman coding method is encoded;
Compression module, the character of the customized part of user for traversing E-mail address, and by character code module Obtained coding result is encoded to be stored with byte arrays.
Compared with prior art, the present invention having the following advantages and benefits:
1, make when present invention application since huffman coding table is the coding schedule that obtains according to actual production data The binary sized obtained after the mailbox coding in entire library can be saved than luv space with the coding schedule,.
2, by the characteristic of huffman coding, special storage mode of the invention can guarantee finally obtained in this way the present invention Binary coding is one-to-one relationship for original character string.
3, the storage mode in the present invention only needs 3 binary digits (bit) to each character string, and efficiency is higher.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is flow diagram of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made For limitation of the invention.
Embodiment 1
As shown in Figure 1, a kind of method applied to the desensitization of E-mail address sensitive information, follows the steps below:
A, it splits, E-mail address is split as the customized part of user according to character and server defines part;Tool Body process are as follows: according to the@symbol of email address, the character of E-mail address is split as user name and domain name, before@symbol For user name, i.e., the described customized part of user ,@symbol is later domain name, i.e., the described server defines part.
B, it encodes, the customized part of the user of E-mail address is encoded by Huffman coding method;Specifically Are as follows: the frequency of occurrences of each character of the customized part of user in statistics E-mail address, by all characters according to appearance frequency Rate sorts from high to low and creates sequencing table, huffman coding table is created according to the sequence of the sequencing table, according to the Hough Graceful coding schedule encodes the customized part of the user of E-mail address.The character include English alphabet, numerical character, Any combination of fullstop, hyphen and underscore.
The process of huffman coding table is created according to sequencing table specifically:
B1, the frequency of occurrences of character in the user name in E-mail address is obtained by counting or sampling, be electronics postal Totally 39 characters create 39 nodes to English alphabet, numerical character, fullstop, hyphen and the underscore that case address is likely to occur, The frequency values of the frequency of occurrences are corresponded in each node comprising it, and are arranged from small to large by frequency values, node array is stored in In, node array is denoted as N1, N2, N3 ..., N39;
B2, the first two node N1, N2 are removed from node array, the frequency of the first two node is added creation one new section Point P3, and using N1, N2 as two child nodes of P3;
B3, according to the frequency of P3 by being sequentially inserted into origin node array from small to large, wherein the frequency of P3 by N1, N2. frequency addition is got;
B4, repeat the above steps B2, B3, until entire node array is only left the last one node R, R, that is, binary tree Root node, and each leaf node corresponds to 39 characters that E-mail address is likely to occur;
B5,0 then is added in path if it is left child node to each child node from each cotyledon node of R traversal, if it is Right child node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the volume of character representated by the leaf node Code;
B6, the corresponding coding of each character is stored in Hash table T, for subsequent use.
Step B further include: encode the server definition part of E-mail address by Huffman coding method. Because the repeat character (RPT) rate of domain name is higher than user name, desensitize if necessary to the domain name to E-mail address, it is optimal Mode is that in addition individually the character frequency of sampling statistics domain name creates a coding schedule again.Specifically: statistics E-mail address Middle server defines the frequency of occurrences of each character of part, and all characters are sorted and created from high to low according to the frequency of occurrences Sequencing table creates huffman coding table according to the sequence of the sequencing table, to E-mail address according to the huffman coding table The server definition part of location is encoded.
C, it compresses, the character of the customized part of user of E-mail address is traversed, by coding result obtained in step B It is stored with byte arrays;Detailed process are as follows:
The character of the customized part of user of C1, traversal E-mail address, to each of which character code, and uses byte Storage of array, storage format are sequentially written in 0 or 1 by binary digit is left-to-right since the 3rd of first character section, are sequentially written in The coding of all characters in the customized part of the user of E-mail address;
In C2, the byte arrays obtained due to step C1, the last byte is not to be all occupied full completely, with first First three binary digit of byte come indicate the last byte occupy highest order.
Embodiment 2
The present embodiment is based on embodiment 1, and the E-mail address desensitization algorithm provided includes the following steps.
Step A is split -- step B coding-step C compression
Each step is described separately below
Step A is split
First according to the@symbol of email address, the user name V1 of mailbox and domain name V2 is dismantled, such as wang@sina.com, V1 is then wang, V2 sina.com.Desensitization can be carried out to V1 and V2 on demand according to business field respectively or only V1 is taken off It is quick.In the present invention, principle is illustrated by only being desensitized to name in an account book V1.
Step B coding, step C compression
The character that email address character allows is that letter, number, fullstop (), hyphen (-) or five class of underscore (_) are total 39 characters (English alphabet case-insensitive) out.Its frequecy characteristic is more obvious, manages it at me and counts in system, 6 before ranking The letter occurrence frequency of position is added up more than 50%, therefore the present invention is utilized in the widely applied huffman coding of compression field Algorithm effectively can carry out compression storage to email address.
The summary of huffman coding principle: huffman coding is a kind of variable length code of different prefix, and basic principle use is most short The highest character of coded representation frequency, the longest coded representation of the minimum character of frequency.Such as aaabbc, coding schedule a- > 1, b- > 01c- > 00, such aaabbc are represented by 111010100, and 9 (bit), that is, need two bytes (byte) in total Storage, and former character string needs 6 byte storages (each symbol accounts for a byte).
In the present invention, before desensitization starts, huffman coding table (binary tree knot is created first with widely used algorithm Structure), it is summarized as follows:
1, it show that the frequency of character occurs in the user name in system in email address by counting or sampling, is email address 39 characters being likely to occur create 39 nodes, include its corresponding frequency values in each node, and by frequency values by It is small to arrive arrangement, it is stored in node array.
N1, N2, N3 ..., N39
2, the first two node N1, N2 are removed from array, and the frequency of two nodes is added one new node P3 of creation, and By N1, two child nodes of the N2 as P3.
3, according to the frequency of P3 (by N1, N2 be added get) by being sequentially inserted into former array from small to large.
4, repeat the above steps 2-3, until entire array is only left the last one node R, R, that is, binary tree root node, And each leaf node corresponds to 39 characters that email address is likely to occur.
5, each cotyledon node being traversed from R, 0 then is added in path if it is left child node, if it is the right side to each child node Child node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the volume of letter representated by the leaf node Code.
6, it by the corresponding coding deposit Hash table T of each letter, is used for later step.
7, remarks: because the repeat character (RPT) rate of domain name is higher than user name, if necessary to the domain name to email address Desensitization, a kind of way of suggestion are that in addition individually the character frequency of sampling statistics domain name creates a coding schedule again.
After obtaining coding schedule, the present invention encodes the obtained V1 of step A using coding schedule, then uses a kind of spy Different directions or tendencies method encodes to store, the specific steps are as follows:
1, the character for traversing V1 obtains its coding to the Hash table T that each of which character C is obtained according to previous step, uses Byte arrays M1 storage.Storage format is by binary digit from the 3rd (binary digit serial number is calculated since 0) of first character section Start left-to-right to be sequentially written in 0 or 1.
Such as first character is encoded to 011, the first character section of byte number M1 is as follows at this time:
Digit 0 1 2 3 4 5 6 7
Binary value 0 0 0 0 1 1 0 0
If second character is encoded to 1110, after being stored in the character, character array M1 is as follows:
The coding of all characters in V1 is according to said method written in turn.
2, after to the write-in of each coding of V1, in the byte arrays that are obtained due to 1 step, the last byte is not All be stained with completely, such as the example in step 1, after inputting two characters, the last byte actually only used the 0th and 1st.The last byte 0-7 all possibility is occupied in actual scene, a using first three of first character section in the present invention Binary digit (following to be claimed with " position indicator " generation), to indicate the highest order of the last byte occupancy, (three binary digits are rigid 0-7 can be indicated well).Example in step 1, the occupied highest order of the last byte is 1, so before first character section Three to should be 001 and byte arrays M1 finally as follows:
Binary system finally obtained in this way can be guaranteed by the characteristic and special storage mode of the invention of huffman coding Coding is one-to-one relationship for original character string, and this storage mode only needs 3 binary digits to each character string (bit), efficiency is higher.
Embodiment 3
It based on embodiment 1 and embodiment 2, desensitizes to chen@sina.com, first according to the Huffman of sampling creation Coding schedule, part coding are as follows:
Character Coding Character Coding
c 1011 h 10000
e 01100 n 010
Step A is split
Chen@sina.com is split to obtain as V1:chen
Step B coding, step C compression encode V1, are firstly inserted into c coding 1011, as follows,
(computer is stored by 8, so the 7th is inserted 0, similarly hereinafter)
It is then inserted into the coding of h, as follows:
Digit 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Binary value 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0
It is then inserted into the coding of e
Be finally inserted the coding of n, and update an indicator (the last byte last position be 3, so position designator bit is write Enter 3)
That is the compression binary coding M1 of chen is
011101110000011000100000
Step D Binary Conversion
Code conversion is carried out to M1 with Base64 coding
R1 is obtained after M1 conversion are as follows: DWYG
Step E splices mailbox
Value after the R1 splicing original domain name obtained with step C is finally desensitized is DWYG@sina.com
Embodiment 4
A kind of system applied to the desensitization of E-mail address sensitive information, including split module, character code module, pressure Contracting module, in which:
Module is split, for E-mail address being split as the customized part of user according to character and server defines portion Point;
Character code module, for passing through the customized part of user for splitting the E-mail address that module is split Huffman coding method is encoded;
Compression module, the character of the customized part of user for traversing E-mail address, and by character code module Obtained coding result is encoded to be stored with byte arrays.
Data after present invention desensitization can preferably hide sensitive information, and according to the data after largely desensitizing without Method cracks out former data easily.Due to cataloged procedure of the invention, be all can by displacement, with, or wait binary operations come It completes, at present in CPU, it is very efficient for executing binary operation.Since huffman coding table is according to actual production data And the coding schedule obtained, therefore luv space is compared to the binary sized obtained after the mailbox coding in entire library using the coding schedule About 35% or so can be saved.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (9)

1. a kind of coding and storing method applied to E-mail address, which is characterized in that follow the steps below:
A, it splits, E-mail address is split as the customized part of user according to character and server defines part;
B, it encodes, the customized part of the user of E-mail address is encoded by Huffman coding method;
C, it compresses, the character of the customized part of user of E-mail address is traversed, by coding result word obtained in step B Save storage of array.
2. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described The detailed process of step A are as follows: according to the@symbol of email address, the character of E-mail address is split as user name and domain Name ,@symbol are before user name, i.e., the described customized part of user, are domain name after@symbol, i.e., the described server defines portion Point.
3. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described Step B specifically: the frequency of occurrences of each character of the customized part of user in statistics E-mail address presses all characters It sorts from high to low according to the frequency of occurrences and creates sequencing table, huffman coding table is created according to the sequence of the sequencing table, according to The huffman coding table encodes the customized part of the user of E-mail address.
4. a kind of coding and storing method applied to E-mail address according to claim 3, which is characterized in that described Character includes any combination of English alphabet, numerical character, fullstop, hyphen and underscore.
5. a kind of coding and storing method applied to E-mail address according to claim 4, which is characterized in that described The process of huffman coding table is created in step B according to sequencing table specifically:
B1, the frequency of occurrences of character in the user name in E-mail address is obtained by counting or sampling, for E-mail address Totally 39 characters create 39 nodes to English alphabet, numerical character, fullstop, hyphen and the underscore that location is likely to occur, each The frequency values of the frequency of occurrences are corresponded in node comprising it, and are arranged from small to large by frequency values, are stored in node array, section Point array is denoted as N1, N2, N3 ..., N39;
B2, the first two node N1, N2 are removed from node array, the frequency of the first two node is added one new node of creation P3, and using N1, N2 as two child nodes of P3;
B3, according to the frequency of P3 by being sequentially inserted into origin node array from small to large, wherein the frequency of P3 is by N1, N2. Frequency addition is got;
B4, repeat the above steps B2, B3, until entire node array is only left the last one node R, R, that is, binary tree root section Point, and each leaf node corresponds to 39 characters that E-mail address is likely to occur;
B5,0 then is added in path if it is left child node, if it is right son to each child node from each cotyledon node of R traversal Node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the coding of character representated by the leaf node;
B6, the corresponding coding of each character is stored in Hash table T, for subsequent use.
6. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described Step B further include: encode the server definition part of E-mail address by Huffman coding method.
7. a kind of coding and storing method applied to E-mail address according to claim 6, which is characterized in that described The server of E-mail address is defined into the process that part is encoded by Huffman coding method in step B specifically: Statistics E-mail address in server define part each character the frequency of occurrences, by all characters according to the frequency of occurrences from High to Low sequence simultaneously creates sequencing table, creates huffman coding table according to the sequence of the sequencing table, is compiled according to the Huffman Code table encodes the server definition part of E-mail address.
8. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described The detailed process for storing obtained coding result with byte arrays in step C are as follows:
The character of the customized part of user of C1, traversal E-mail address, to each of which character code, and uses byte arrays Storage, storage format are sequentially written in 0 or 1 by binary digit is left-to-right since the 3rd of first character section, are sequentially written in electronics The coding of all characters in the customized part of the user of email address;
In C2, the byte arrays obtained due to step C1, the last byte is not to be all occupied full completely, with first character section First three binary digit come indicate the last byte occupy highest order.
9. a kind of code storage system applied to E-mail address, which is characterized in that including splitting module, character code mould Block, compression module, in which:
Module is split, for E-mail address being split as the customized part of user according to character and server defines part;
Character code module, for the customized part of user for splitting the E-mail address that module is split to be passed through Hough Graceful coding method is encoded;
Compression module, the character of the customized part of user for traversing E-mail address, and by character code module coding Obtained coding result is stored with byte arrays.
CN201910091867.5A 2019-01-30 2019-01-30 Code storage method and system applied to email address Active CN109831544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910091867.5A CN109831544B (en) 2019-01-30 2019-01-30 Code storage method and system applied to email address

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910091867.5A CN109831544B (en) 2019-01-30 2019-01-30 Code storage method and system applied to email address

Publications (2)

Publication Number Publication Date
CN109831544A true CN109831544A (en) 2019-05-31
CN109831544B CN109831544B (en) 2021-10-08

Family

ID=66862950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910091867.5A Active CN109831544B (en) 2019-01-30 2019-01-30 Code storage method and system applied to email address

Country Status (1)

Country Link
CN (1) CN109831544B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506781A (en) * 2020-04-21 2020-08-07 四川创智联恒科技有限公司 Method, system, terminal device and readable storage medium for greatly compressing volume of database
CN113301175A (en) * 2020-07-14 2021-08-24 阿里巴巴集团控股有限公司 Service calling method, data storage method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69332253D1 (en) * 1992-10-13 2002-10-02 Nec Corp Decoding circuit for Huffman codes
CN101022552A (en) * 2007-03-13 2007-08-22 北京中星微电子有限公司 Method and device for realizing Hoffman decodeng
US20130181851A1 (en) * 2012-01-17 2013-07-18 Fujitsu Limited Encoding method, encoding apparatus, decoding method, decoding apparatus, and system
CN104283568A (en) * 2013-07-12 2015-01-14 中国科学院声学研究所 Data compressed encoding method based on part Hoffman tree
CN106203139A (en) * 2016-07-13 2016-12-07 成都知道创宇信息技术有限公司 A kind of data local desensitization method
CN104283567B (en) * 2013-07-02 2018-07-03 北京四维图新科技股份有限公司 A kind of compression of name data, decompression method and equipment
CN109120273A (en) * 2018-08-29 2019-01-01 重庆物奇科技有限公司 Code device, code translator and system based on huffman coding

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69332253D1 (en) * 1992-10-13 2002-10-02 Nec Corp Decoding circuit for Huffman codes
CN101022552A (en) * 2007-03-13 2007-08-22 北京中星微电子有限公司 Method and device for realizing Hoffman decodeng
US20130181851A1 (en) * 2012-01-17 2013-07-18 Fujitsu Limited Encoding method, encoding apparatus, decoding method, decoding apparatus, and system
CN104283567B (en) * 2013-07-02 2018-07-03 北京四维图新科技股份有限公司 A kind of compression of name data, decompression method and equipment
CN104283568A (en) * 2013-07-12 2015-01-14 中国科学院声学研究所 Data compressed encoding method based on part Hoffman tree
CN106203139A (en) * 2016-07-13 2016-12-07 成都知道创宇信息技术有限公司 A kind of data local desensitization method
CN109120273A (en) * 2018-08-29 2019-01-01 重庆物奇科技有限公司 Code device, code translator and system based on huffman coding

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506781A (en) * 2020-04-21 2020-08-07 四川创智联恒科技有限公司 Method, system, terminal device and readable storage medium for greatly compressing volume of database
CN113301175A (en) * 2020-07-14 2021-08-24 阿里巴巴集团控股有限公司 Service calling method, data storage method, device, equipment and storage medium
CN113301175B (en) * 2020-07-14 2022-04-12 阿里巴巴集团控股有限公司 Service calling method, data storage method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN109831544B (en) 2021-10-08

Similar Documents

Publication Publication Date Title
CN106852185B (en) Parallelly compressed encoder based on dictionary
CN104579360B (en) A kind of method and apparatus of data processing
CN106202172B (en) Text compression methods and device
US20130141259A1 (en) Method and system for data compression
CN101283349B (en) Compressing language models with Golomb coding
CN103995887A (en) Bitmap index compressing method and bitmap index decompressing method
CN108717461B (en) Mass data structuring method and device, computer equipment and storage medium
Ganguly et al. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems
GB2523937A (en) Method and device for mining data regular expression
CN109831544A (en) A kind of coding and storing method and system applied to E-mail address
WO2015067996A1 (en) Methods and apparatuses of digital data processing
Haj Rachid et al. A practical and scalable tool to find overlaps between sequences
Iliopoulos et al. A new efficient algorithm for computing the longest common subsequence
Nakashima et al. Constructing LZ78 tries and position heaps in linear time for large alphabets
CN110825919B (en) ID data processing method and device
CN109660262A (en) A kind of character coding method and system applied to E-mail address
CN105938469B (en) Coding and storing method, text storing data structure and Text compression storage and statistics output method
CN106571909A (en) Data encryption method and device
CN106452451A (en) Data processing method and device
CN109829335A (en) A kind of method and system applied to the desensitization of E-mail address sensitive information
Mishra et al. Fast pattern matching in compressed text using wavelet tree
Gagie et al. Compressing and indexing aligned readsets
CN108399152A (en) Compression expression method, system, storage medium and the rule match device of digital search tree
CN103138766A (en) Method and device of compression and decompression of data
WO2013159156A1 (en) Method for storing and applying related sets of pattern/message rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant