CN109831544A - A kind of coding and storing method and system applied to E-mail address - Google Patents
A kind of coding and storing method and system applied to E-mail address Download PDFInfo
- Publication number
- CN109831544A CN109831544A CN201910091867.5A CN201910091867A CN109831544A CN 109831544 A CN109831544 A CN 109831544A CN 201910091867 A CN201910091867 A CN 201910091867A CN 109831544 A CN109831544 A CN 109831544A
- Authority
- CN
- China
- Prior art keywords
- coding
- mail address
- character
- node
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of coding and storing methods applied to E-mail address, follow the steps below: splitting;Coding;Compression.The present invention solves inefficient in the storing process in the desensitization method of the existing technology to E-mail address after character code, the problem of expending space, a kind of coding and storing method and system applied to E-mail address is provided, since huffman coding table is the coding schedule that obtains according to actual production data when it is applied, therefore the binary sized obtained after the mailbox coding in entire library can be saved than luv space using the coding schedule, simultaneously by the characteristic of huffman coding, special storage mode of the invention can guarantee that binary coding finally obtained in this way is one-to-one relationship for original character string, and this storage mode only needs 3 binary digits (bit) to each character string, efficiency is higher.
Description
Technical field
The present invention relates to code storage fields, and in particular to a kind of coding and storing method applied to E-mail address and
System.
Background technique
Due in the Database Systems of bank, there being the privacy-sensitive information of a large amount of individual.And in the daily work of bank
In work, and the moment needs various data to carry out the work, leaking data it is very risky, so needing to add sensitive information
Work processing, hides privacy-sensitive data.
E-mail address is an important privacy information for personal, can not only be contacted by the mailbox
To the owner, and many websites, the app application of mobile phone terminal is all that can be bound by mailbox or even some important accounts
Retrieval function can also be reset by mailbox.E-mail address is due to less paying attention at present, the relatively simple master of desensitization method
It is divided into following a few classes:
1, symbol Shift Method directly replaces all (or parts) letter with additional character (such as *).
2, displacement method is encoded, the mobile fixed digit of the coding of each letter, such as a displacement are become into b, b displacement becomes c.
For several method there are some disadvantages, the first can effectively hide sensitive information above, but due to its replacement after
Multiple email addresses correspond to the address after one and same coding, break the incidence relation of data, and such as two tables of data have identical address
Mailbox, implement this method desensitization after, analysis personnel can not know two datas it is corresponding be the same address.Second of holding
One-to-one incidence relation, but be easy from known desensitization result to extrapolate the parameter of offset, to obtain former data.
Summary of the invention
The present invention solves the storage in the desensitization method of the existing technology to E-mail address after character code
It is inefficient in journey, expend space the problem of, a kind of coding and storing method and system applied to E-mail address is provided,
Using when since huffman coding table is the coding schedule that obtains according to actual production data, using the coding schedule to entire
The binary sized obtained after the mailbox coding in library can be saved than luv space, while by the characteristic of huffman coding, this hair
Bright special storage mode can guarantee that binary coding finally obtained in this way is one-to-one relationship for original character string,
And this storage mode only needs 3 binary digits (bit) to each character string, efficiency is higher.
The present invention is achieved through the following technical solutions:
A kind of coding and storing method applied to E-mail address, follows the steps below:
A, it splits, E-mail address is split as the customized part of user according to character and server defines part;
B, it encodes, the customized part of the user of E-mail address is encoded by Huffman coding method;
C, it compresses, the character of the customized part of user of E-mail address is traversed, by coding result obtained in step B
It is stored with byte arrays.
Further, a kind of coding and storing method applied to E-mail address, the detailed process of the step A are as follows:
According to the@symbol of email address, the character of E-mail address is split as user name and domain name, is user name before@symbol,
The i.e. described customized part of user ,@symbol are later domain name, i.e., the described server defines part.
Further, a kind of coding and storing method applied to E-mail address, the step B specifically: statistics electricity
The frequency of occurrences of each character of the customized part of user in sub-voice mailbox address, from high to low according to the frequency of occurrences by all characters
It sorts and creates sequencing table, huffman coding table is created according to the sequence of the sequencing table, according to the huffman coding table pair
The customized part of the user of E-mail address encodes.
Further, a kind of coding and storing method applied to E-mail address, the character include English alphabet, number
Word character, fullstop, hyphen and underscore any combination.
Further, a kind of coding and storing method applied to E-mail address is created according to sequencing table in the step B
Build the process of huffman coding table specifically:
B1, the frequency of occurrences of character in the user name in E-mail address is obtained by counting or sampling, be electronics postal
Totally 39 characters create 39 nodes to English alphabet, numerical character, fullstop, hyphen and the underscore that case address is likely to occur,
The frequency values of the frequency of occurrences are corresponded in each node comprising it, and are arranged from small to large by frequency values, node array is stored in
In, node array is denoted as N1, N2, N3 ..., N39;
B2, the first two node N1, N2 are removed from node array, the frequency of the first two node is added creation one new section
Point P3, and using N1, N2 as two child nodes of P3;
B3, according to the frequency of P3 by being sequentially inserted into origin node array from small to large, wherein the frequency of P3 by N1,
N2. frequency addition is got;
B4, repeat the above steps B2, B3, until entire node array is only left the last one node R, R, that is, binary tree
Root node, and each leaf node corresponds to 39 characters that E-mail address is likely to occur;
B5,0 then is added in path if it is left child node to each child node from each cotyledon node of R traversal, if it is
Right child node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the volume of character representated by the leaf node
Code;
B6, the corresponding coding of each character is stored in Hash table T, for subsequent use.
Further, a kind of coding and storing method applied to E-mail address, the step B further include: by electronics
The server definition part of email address is encoded by Huffman coding method.
Further, a kind of coding and storing method applied to E-mail address, in the step B by E-mail address
The server of location defines the process that part is encoded by Huffman coding method specifically: takes in statistics E-mail address
Business device defines the frequency of occurrences of each character of part, and all characters are sorted from high to low according to the frequency of occurrences and create sequence
Table creates huffman coding table according to the sequence of the sequencing table, according to the huffman coding table to E-mail address
Server definition part is encoded.
Further, a kind of coding and storing method applied to E-mail address, the coding that will be obtained in the step C
As a result the detailed process stored with byte arrays are as follows:
The character of the customized part of user of C1, traversal E-mail address, to each of which character code, and uses byte
Storage of array, storage format are sequentially written in 0 or 1 by binary digit is left-to-right since the 3rd of first character section, are sequentially written in
The coding of all characters in the customized part of the user of E-mail address;
In C2, the byte arrays obtained due to step C1, the last byte is not to be all occupied full completely, with first
First three binary digit of byte come indicate the last byte occupy highest order.
A kind of code storage system applied to E-mail address, including split module, character code module, compression mould
Block, in which:
Module is split, for E-mail address being split as the customized part of user according to character and server defines portion
Point;
Character code module, for passing through the customized part of user for splitting the E-mail address that module is split
Huffman coding method is encoded;
Compression module, the character of the customized part of user for traversing E-mail address, and by character code module
Obtained coding result is encoded to be stored with byte arrays.
Compared with prior art, the present invention having the following advantages and benefits:
1, make when present invention application since huffman coding table is the coding schedule that obtains according to actual production data
The binary sized obtained after the mailbox coding in entire library can be saved than luv space with the coding schedule,.
2, by the characteristic of huffman coding, special storage mode of the invention can guarantee finally obtained in this way the present invention
Binary coding is one-to-one relationship for original character string.
3, the storage mode in the present invention only needs 3 binary digits (bit) to each character string, and efficiency is higher.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application
Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is flow diagram of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this
Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made
For limitation of the invention.
Embodiment 1
As shown in Figure 1, a kind of method applied to the desensitization of E-mail address sensitive information, follows the steps below:
A, it splits, E-mail address is split as the customized part of user according to character and server defines part;Tool
Body process are as follows: according to the@symbol of email address, the character of E-mail address is split as user name and domain name, before@symbol
For user name, i.e., the described customized part of user ,@symbol is later domain name, i.e., the described server defines part.
B, it encodes, the customized part of the user of E-mail address is encoded by Huffman coding method;Specifically
Are as follows: the frequency of occurrences of each character of the customized part of user in statistics E-mail address, by all characters according to appearance frequency
Rate sorts from high to low and creates sequencing table, huffman coding table is created according to the sequence of the sequencing table, according to the Hough
Graceful coding schedule encodes the customized part of the user of E-mail address.The character include English alphabet, numerical character,
Any combination of fullstop, hyphen and underscore.
The process of huffman coding table is created according to sequencing table specifically:
B1, the frequency of occurrences of character in the user name in E-mail address is obtained by counting or sampling, be electronics postal
Totally 39 characters create 39 nodes to English alphabet, numerical character, fullstop, hyphen and the underscore that case address is likely to occur,
The frequency values of the frequency of occurrences are corresponded in each node comprising it, and are arranged from small to large by frequency values, node array is stored in
In, node array is denoted as N1, N2, N3 ..., N39;
B2, the first two node N1, N2 are removed from node array, the frequency of the first two node is added creation one new section
Point P3, and using N1, N2 as two child nodes of P3;
B3, according to the frequency of P3 by being sequentially inserted into origin node array from small to large, wherein the frequency of P3 by N1,
N2. frequency addition is got;
B4, repeat the above steps B2, B3, until entire node array is only left the last one node R, R, that is, binary tree
Root node, and each leaf node corresponds to 39 characters that E-mail address is likely to occur;
B5,0 then is added in path if it is left child node to each child node from each cotyledon node of R traversal, if it is
Right child node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the volume of character representated by the leaf node
Code;
B6, the corresponding coding of each character is stored in Hash table T, for subsequent use.
Step B further include: encode the server definition part of E-mail address by Huffman coding method.
Because the repeat character (RPT) rate of domain name is higher than user name, desensitize if necessary to the domain name to E-mail address, it is optimal
Mode is that in addition individually the character frequency of sampling statistics domain name creates a coding schedule again.Specifically: statistics E-mail address
Middle server defines the frequency of occurrences of each character of part, and all characters are sorted and created from high to low according to the frequency of occurrences
Sequencing table creates huffman coding table according to the sequence of the sequencing table, to E-mail address according to the huffman coding table
The server definition part of location is encoded.
C, it compresses, the character of the customized part of user of E-mail address is traversed, by coding result obtained in step B
It is stored with byte arrays;Detailed process are as follows:
The character of the customized part of user of C1, traversal E-mail address, to each of which character code, and uses byte
Storage of array, storage format are sequentially written in 0 or 1 by binary digit is left-to-right since the 3rd of first character section, are sequentially written in
The coding of all characters in the customized part of the user of E-mail address;
In C2, the byte arrays obtained due to step C1, the last byte is not to be all occupied full completely, with first
First three binary digit of byte come indicate the last byte occupy highest order.
Embodiment 2
The present embodiment is based on embodiment 1, and the E-mail address desensitization algorithm provided includes the following steps.
Step A is split -- step B coding-step C compression
Each step is described separately below
Step A is split
First according to the@symbol of email address, the user name V1 of mailbox and domain name V2 is dismantled, such as wang@sina.com,
V1 is then wang, V2 sina.com.Desensitization can be carried out to V1 and V2 on demand according to business field respectively or only V1 is taken off
It is quick.In the present invention, principle is illustrated by only being desensitized to name in an account book V1.
Step B coding, step C compression
The character that email address character allows is that letter, number, fullstop (), hyphen (-) or five class of underscore (_) are total
39 characters (English alphabet case-insensitive) out.Its frequecy characteristic is more obvious, manages it at me and counts in system, 6 before ranking
The letter occurrence frequency of position is added up more than 50%, therefore the present invention is utilized in the widely applied huffman coding of compression field
Algorithm effectively can carry out compression storage to email address.
The summary of huffman coding principle: huffman coding is a kind of variable length code of different prefix, and basic principle use is most short
The highest character of coded representation frequency, the longest coded representation of the minimum character of frequency.Such as aaabbc, coding schedule a- >
1, b- > 01c- > 00, such aaabbc are represented by 111010100, and 9 (bit), that is, need two bytes (byte) in total
Storage, and former character string needs 6 byte storages (each symbol accounts for a byte).
In the present invention, before desensitization starts, huffman coding table (binary tree knot is created first with widely used algorithm
Structure), it is summarized as follows:
1, it show that the frequency of character occurs in the user name in system in email address by counting or sampling, is email address
39 characters being likely to occur create 39 nodes, include its corresponding frequency values in each node, and by frequency values by
It is small to arrive arrangement, it is stored in node array.
N1, N2, N3 ..., N39
2, the first two node N1, N2 are removed from array, and the frequency of two nodes is added one new node P3 of creation, and
By N1, two child nodes of the N2 as P3.
3, according to the frequency of P3 (by N1, N2 be added get) by being sequentially inserted into former array from small to large.
4, repeat the above steps 2-3, until entire array is only left the last one node R, R, that is, binary tree root node,
And each leaf node corresponds to 39 characters that email address is likely to occur.
5, each cotyledon node being traversed from R, 0 then is added in path if it is left child node, if it is the right side to each child node
Child node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the volume of letter representated by the leaf node
Code.
6, it by the corresponding coding deposit Hash table T of each letter, is used for later step.
7, remarks: because the repeat character (RPT) rate of domain name is higher than user name, if necessary to the domain name to email address
Desensitization, a kind of way of suggestion are that in addition individually the character frequency of sampling statistics domain name creates a coding schedule again.
After obtaining coding schedule, the present invention encodes the obtained V1 of step A using coding schedule, then uses a kind of spy
Different directions or tendencies method encodes to store, the specific steps are as follows:
1, the character for traversing V1 obtains its coding to the Hash table T that each of which character C is obtained according to previous step, uses
Byte arrays M1 storage.Storage format is by binary digit from the 3rd (binary digit serial number is calculated since 0) of first character section
Start left-to-right to be sequentially written in 0 or 1.
Such as first character is encoded to 011, the first character section of byte number M1 is as follows at this time:
Digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Binary value | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
If second character is encoded to 1110, after being stored in the character, character array M1 is as follows:
The coding of all characters in V1 is according to said method written in turn.
2, after to the write-in of each coding of V1, in the byte arrays that are obtained due to 1 step, the last byte is not
All be stained with completely, such as the example in step 1, after inputting two characters, the last byte actually only used the 0th and
1st.The last byte 0-7 all possibility is occupied in actual scene, a using first three of first character section in the present invention
Binary digit (following to be claimed with " position indicator " generation), to indicate the highest order of the last byte occupancy, (three binary digits are rigid
0-7 can be indicated well).Example in step 1, the occupied highest order of the last byte is 1, so before first character section
Three to should be 001 and byte arrays M1 finally as follows:
Binary system finally obtained in this way can be guaranteed by the characteristic and special storage mode of the invention of huffman coding
Coding is one-to-one relationship for original character string, and this storage mode only needs 3 binary digits to each character string
(bit), efficiency is higher.
Embodiment 3
It based on embodiment 1 and embodiment 2, desensitizes to chen@sina.com, first according to the Huffman of sampling creation
Coding schedule, part coding are as follows:
Character | Coding | Character | Coding |
c | 1011 | h | 10000 |
e | 01100 | n | 010 |
Step A is split
Chen@sina.com is split to obtain as V1:chen
Step B coding, step C compression encode V1, are firstly inserted into c coding 1011, as follows,
(computer is stored by 8, so the 7th is inserted 0, similarly hereinafter)
It is then inserted into the coding of h, as follows:
Digit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
Binary value | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
It is then inserted into the coding of e
Be finally inserted the coding of n, and update an indicator (the last byte last position be 3, so position designator bit is write
Enter 3)
That is the compression binary coding M1 of chen is
011101110000011000100000
Step D Binary Conversion
Code conversion is carried out to M1 with Base64 coding
R1 is obtained after M1 conversion are as follows: DWYG
Step E splices mailbox
Value after the R1 splicing original domain name obtained with step C is finally desensitized is DWYG@sina.com
Embodiment 4
A kind of system applied to the desensitization of E-mail address sensitive information, including split module, character code module, pressure
Contracting module, in which:
Module is split, for E-mail address being split as the customized part of user according to character and server defines portion
Point;
Character code module, for passing through the customized part of user for splitting the E-mail address that module is split
Huffman coding method is encoded;
Compression module, the character of the customized part of user for traversing E-mail address, and by character code module
Obtained coding result is encoded to be stored with byte arrays.
Data after present invention desensitization can preferably hide sensitive information, and according to the data after largely desensitizing without
Method cracks out former data easily.Due to cataloged procedure of the invention, be all can by displacement, with, or wait binary operations come
It completes, at present in CPU, it is very efficient for executing binary operation.Since huffman coding table is according to actual production data
And the coding schedule obtained, therefore luv space is compared to the binary sized obtained after the mailbox coding in entire library using the coding schedule
About 35% or so can be saved.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (9)
1. a kind of coding and storing method applied to E-mail address, which is characterized in that follow the steps below:
A, it splits, E-mail address is split as the customized part of user according to character and server defines part;
B, it encodes, the customized part of the user of E-mail address is encoded by Huffman coding method;
C, it compresses, the character of the customized part of user of E-mail address is traversed, by coding result word obtained in step B
Save storage of array.
2. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described
The detailed process of step A are as follows: according to the@symbol of email address, the character of E-mail address is split as user name and domain
Name ,@symbol are before user name, i.e., the described customized part of user, are domain name after@symbol, i.e., the described server defines portion
Point.
3. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described
Step B specifically: the frequency of occurrences of each character of the customized part of user in statistics E-mail address presses all characters
It sorts from high to low according to the frequency of occurrences and creates sequencing table, huffman coding table is created according to the sequence of the sequencing table, according to
The huffman coding table encodes the customized part of the user of E-mail address.
4. a kind of coding and storing method applied to E-mail address according to claim 3, which is characterized in that described
Character includes any combination of English alphabet, numerical character, fullstop, hyphen and underscore.
5. a kind of coding and storing method applied to E-mail address according to claim 4, which is characterized in that described
The process of huffman coding table is created in step B according to sequencing table specifically:
B1, the frequency of occurrences of character in the user name in E-mail address is obtained by counting or sampling, for E-mail address
Totally 39 characters create 39 nodes to English alphabet, numerical character, fullstop, hyphen and the underscore that location is likely to occur, each
The frequency values of the frequency of occurrences are corresponded in node comprising it, and are arranged from small to large by frequency values, are stored in node array, section
Point array is denoted as N1, N2, N3 ..., N39;
B2, the first two node N1, N2 are removed from node array, the frequency of the first two node is added one new node of creation
P3, and using N1, N2 as two child nodes of P3;
B3, according to the frequency of P3 by being sequentially inserted into origin node array from small to large, wherein the frequency of P3 is by N1, N2.
Frequency addition is got;
B4, repeat the above steps B2, B3, until entire node array is only left the last one node R, R, that is, binary tree root section
Point, and each leaf node corresponds to 39 characters that E-mail address is likely to occur;
B5,0 then is added in path if it is left child node, if it is right son to each child node from each cotyledon node of R traversal
Node then adds 1 in path, and until encountering leaf node, then 01 combination on path is the coding of character representated by the leaf node;
B6, the corresponding coding of each character is stored in Hash table T, for subsequent use.
6. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described
Step B further include: encode the server definition part of E-mail address by Huffman coding method.
7. a kind of coding and storing method applied to E-mail address according to claim 6, which is characterized in that described
The server of E-mail address is defined into the process that part is encoded by Huffman coding method in step B specifically:
Statistics E-mail address in server define part each character the frequency of occurrences, by all characters according to the frequency of occurrences from
High to Low sequence simultaneously creates sequencing table, creates huffman coding table according to the sequence of the sequencing table, is compiled according to the Huffman
Code table encodes the server definition part of E-mail address.
8. a kind of coding and storing method applied to E-mail address according to claim 1, which is characterized in that described
The detailed process for storing obtained coding result with byte arrays in step C are as follows:
The character of the customized part of user of C1, traversal E-mail address, to each of which character code, and uses byte arrays
Storage, storage format are sequentially written in 0 or 1 by binary digit is left-to-right since the 3rd of first character section, are sequentially written in electronics
The coding of all characters in the customized part of the user of email address;
In C2, the byte arrays obtained due to step C1, the last byte is not to be all occupied full completely, with first character section
First three binary digit come indicate the last byte occupy highest order.
9. a kind of code storage system applied to E-mail address, which is characterized in that including splitting module, character code mould
Block, compression module, in which:
Module is split, for E-mail address being split as the customized part of user according to character and server defines part;
Character code module, for the customized part of user for splitting the E-mail address that module is split to be passed through Hough
Graceful coding method is encoded;
Compression module, the character of the customized part of user for traversing E-mail address, and by character code module coding
Obtained coding result is stored with byte arrays.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910091867.5A CN109831544B (en) | 2019-01-30 | 2019-01-30 | Code storage method and system applied to email address |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910091867.5A CN109831544B (en) | 2019-01-30 | 2019-01-30 | Code storage method and system applied to email address |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109831544A true CN109831544A (en) | 2019-05-31 |
CN109831544B CN109831544B (en) | 2021-10-08 |
Family
ID=66862950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910091867.5A Active CN109831544B (en) | 2019-01-30 | 2019-01-30 | Code storage method and system applied to email address |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109831544B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111506781A (en) * | 2020-04-21 | 2020-08-07 | 四川创智联恒科技有限公司 | Method, system, terminal device and readable storage medium for greatly compressing volume of database |
CN113301175A (en) * | 2020-07-14 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Service calling method, data storage method, device, equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69332253D1 (en) * | 1992-10-13 | 2002-10-02 | Nec Corp | Decoding circuit for Huffman codes |
CN101022552A (en) * | 2007-03-13 | 2007-08-22 | 北京中星微电子有限公司 | Method and device for realizing Hoffman decodeng |
US20130181851A1 (en) * | 2012-01-17 | 2013-07-18 | Fujitsu Limited | Encoding method, encoding apparatus, decoding method, decoding apparatus, and system |
CN104283568A (en) * | 2013-07-12 | 2015-01-14 | 中国科学院声学研究所 | Data compressed encoding method based on part Hoffman tree |
CN106203139A (en) * | 2016-07-13 | 2016-12-07 | 成都知道创宇信息技术有限公司 | A kind of data local desensitization method |
CN104283567B (en) * | 2013-07-02 | 2018-07-03 | 北京四维图新科技股份有限公司 | A kind of compression of name data, decompression method and equipment |
CN109120273A (en) * | 2018-08-29 | 2019-01-01 | 重庆物奇科技有限公司 | Code device, code translator and system based on huffman coding |
-
2019
- 2019-01-30 CN CN201910091867.5A patent/CN109831544B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69332253D1 (en) * | 1992-10-13 | 2002-10-02 | Nec Corp | Decoding circuit for Huffman codes |
CN101022552A (en) * | 2007-03-13 | 2007-08-22 | 北京中星微电子有限公司 | Method and device for realizing Hoffman decodeng |
US20130181851A1 (en) * | 2012-01-17 | 2013-07-18 | Fujitsu Limited | Encoding method, encoding apparatus, decoding method, decoding apparatus, and system |
CN104283567B (en) * | 2013-07-02 | 2018-07-03 | 北京四维图新科技股份有限公司 | A kind of compression of name data, decompression method and equipment |
CN104283568A (en) * | 2013-07-12 | 2015-01-14 | 中国科学院声学研究所 | Data compressed encoding method based on part Hoffman tree |
CN106203139A (en) * | 2016-07-13 | 2016-12-07 | 成都知道创宇信息技术有限公司 | A kind of data local desensitization method |
CN109120273A (en) * | 2018-08-29 | 2019-01-01 | 重庆物奇科技有限公司 | Code device, code translator and system based on huffman coding |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111506781A (en) * | 2020-04-21 | 2020-08-07 | 四川创智联恒科技有限公司 | Method, system, terminal device and readable storage medium for greatly compressing volume of database |
CN113301175A (en) * | 2020-07-14 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Service calling method, data storage method, device, equipment and storage medium |
CN113301175B (en) * | 2020-07-14 | 2022-04-12 | 阿里巴巴集团控股有限公司 | Service calling method, data storage method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109831544B (en) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106852185B (en) | Parallelly compressed encoder based on dictionary | |
CN104579360B (en) | A kind of method and apparatus of data processing | |
CN106202172B (en) | Text compression methods and device | |
US20130141259A1 (en) | Method and system for data compression | |
CN101283349B (en) | Compressing language models with Golomb coding | |
CN103995887A (en) | Bitmap index compressing method and bitmap index decompressing method | |
CN108717461B (en) | Mass data structuring method and device, computer equipment and storage medium | |
Ganguly et al. | pBWT: Achieving succinct data structures for parameterized pattern matching and related problems | |
GB2523937A (en) | Method and device for mining data regular expression | |
CN109831544A (en) | A kind of coding and storing method and system applied to E-mail address | |
WO2015067996A1 (en) | Methods and apparatuses of digital data processing | |
Haj Rachid et al. | A practical and scalable tool to find overlaps between sequences | |
Iliopoulos et al. | A new efficient algorithm for computing the longest common subsequence | |
Nakashima et al. | Constructing LZ78 tries and position heaps in linear time for large alphabets | |
CN110825919B (en) | ID data processing method and device | |
CN109660262A (en) | A kind of character coding method and system applied to E-mail address | |
CN105938469B (en) | Coding and storing method, text storing data structure and Text compression storage and statistics output method | |
CN106571909A (en) | Data encryption method and device | |
CN106452451A (en) | Data processing method and device | |
CN109829335A (en) | A kind of method and system applied to the desensitization of E-mail address sensitive information | |
Mishra et al. | Fast pattern matching in compressed text using wavelet tree | |
Gagie et al. | Compressing and indexing aligned readsets | |
CN108399152A (en) | Compression expression method, system, storage medium and the rule match device of digital search tree | |
CN103138766A (en) | Method and device of compression and decompression of data | |
WO2013159156A1 (en) | Method for storing and applying related sets of pattern/message rules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |