CN107832345A - The method of base station data unique numberization mark - Google Patents
The method of base station data unique numberization mark Download PDFInfo
- Publication number
- CN107832345A CN107832345A CN201710960854.8A CN201710960854A CN107832345A CN 107832345 A CN107832345 A CN 107832345A CN 201710960854 A CN201710960854 A CN 201710960854A CN 107832345 A CN107832345 A CN 107832345A
- Authority
- CN
- China
- Prior art keywords
- base station
- unique
- mark
- field
- digital
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides a kind of method of base station data unique numberization mark, comprise the following steps:Step 1, according to the title of base station or referred to as and can as unique identification field as input, if the character that field type field must be regular length and each composition character string can exhaustion restriction character and not reproducible;Step 2, unique numeral is converted into for the character string of regular length according to field;Step 3, repeat step 2, until all String fields all change into unique numeral, and the digital jointing of all conversions is got up;Step 4, Digital ID is reversible, unique mark can be converted into base station name and time of origin if searching visualization requirement.The present invention is easy to implement, and effectively improves recall precision.Only in all kinds of mapping base stations, timing uploads the data set to the present invention, reduces the shared flow of repetition upload and data storage brings unnecessary space waste, while reduce the time loss in retrieving.
Description
Technical field
The present invention relates to technical field of software development, and in particular to big data memory scan cleaning technique field.
Background technology
In recent years, as the development of technology, the mankind are increasingly urgent to the demand of all kinds of precise positioning services.Each traditional base
Data volume is uploaded when standing firm to explode.Big data now is required the unique identification of magnanimity base station in data cleansing searching field
Compare it is high, conventional characters string unique mark not only expended in retrieving cpu resource simultaneously recall precision it is low, even if now
To the mode that character string unique mark indexes as data volume explodes in traditional database, index cost also sharply increases.Very
Base station data is represented using Digital ID completely more, because reversible in mark the letter that was originally contained of base station can not be represented
Breath, cause to need to inquire about the useful information of other relation table acquisition in retrieval, this virtually improves retrieval cost and reduction
Recall precision.
The content of the invention
Present invention solves the technical problem that it is exactly that traditional base station Data Identification is changed into unique and reversible numeral to mark
Know, encoded message digit in mark by once identifying, while Gray code can equally reduce and participate in unique mark base station
Essential information.
The technical solution adopted by the present invention is as follows:
A kind of method of base station data unique numberization mark, comprises the following steps:
Step 1, according to the title of physical base station or referred to as and can as unique identification field as input, if
It is character string type field, it is necessary to which the character for being regular length and each forming character string can be exhaustive, and restriction character can not
Repeat.
Step 2, according to the character string type field that field is regular length, it is converted into unique Digital ID.
Step 3, repeat step 2, until all String fields all change into unique Digital ID, and by all conversions
Digital ID is stitched together.
Step 4, Digital ID is reversible, unique mark can be converted into base station name if searching visualization requirement
With broadcast the time.
Beneficial effects of the present invention are as follows:
1st, recall precision is improved, and compared to the retrieval of traditional database, takes out Digital ID while needs to do additionally to look into
Look for;The present invention can realize once to search and just can all be retrieved the information of main uniqueness.
2nd, storage efficiency is improved, passes through the coded system of length-specific, it is possible to reduce character string type unique mark is deposited
Store up space.
Brief description of the drawings
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
Hereinafter, the present invention is further elaborated in conjunction with the accompanying drawings and embodiments.Fig. 1 be a kind of base station data of the present invention only
The method flow schematic diagram of one digital representation, comprises the following steps:
Define one:Input data determines:The regular length character string field or numeric type field that any base station defines
It can serve as inputting.
Define two:String field changes unique numerical identification:The character string fixed for length, the type base of character string
This judgement can be exhaustive, is described below and assumes that field character string is the restriction regular length character mark chosen in (a-z or A-Z)
Show.Determine length and can limit character String field change into unique numerical identification may be considered by these numeral
In the fully intermeshing of regular length, and with this numeral come to each fully intermeshing element numerals.Specific conversion formula is as follows:
(cantor deploys, and cantor expansion is exactly a kind of special hash function, and its use range is the arrangement for some numbers
The compression and storage of carry out state):
X=an* (n-1)!+an-1*(n-2)!+...+ai*(i-1)!+...+a2*1!+al*0!
Wherein, it is to come which (since 0) in the current element not occurred that an, which is, and n is the length of fixed character string.
For example:
For the base coded that regular length is 4 " SFGA ", its conversion formula is:
X (" SFGA ")=a4*3!+a3*2!+a2*1!+a1*0!
Which big element a4=" S " this element is in the array [S, F, G, A] of restriction.Compared by ascii table
Understand, S is the 3rd big element (being calculated since 0).So a4=3.A3=" F ", because the character smaller than F has 1, so a3
=1.A2=" G ", the member smaller than G are known as 2, but because previous element F had occurred, a2=1
X (" SFGA ")=3*3!+1*2!+1*1!+0*0!=18+2+1=21
Define three:Unique identification number is reversible:For given numeral 21, exhaustion is combined in [S, F, G, A]:
1) 20 are obtained with 21-1 first, illustrate there are 20 arrangements before given character string (this number is subtracted 1 in itself).
2) 3 are removed with 20!More than 32 are obtained, illustrates to have that 3 numbers are smaller than the 1st, so first is G.
3) 2 are removed with 2!More than 10 is obtained, illustrates to have that 1 number is smaller than the 2nd, so being F.
4) 1 is removed with 1!Obtain more than 10, similarly, illustrate to have that 1 number is smaller than the 3rd, in remaining character string dimension A and
Can only be element G in G.
5) last remaining element can only be A.
So this character string is SFGA.
Define four:Multiple character string unique numerical identification splicings:Multiple Digital IDs can be by each field regular coding
To be spliced into new unique character string.For example, there are two String fields:Base station time and base station code field,
Two fields can be converted into Digital ID according to defining two, while the numeral of maximum is estimated that according to the character string of restriction
Digit, such as, the exhaustive array [S, F, G, A] of base station abbreviation field, maximum Digital ID can be defined to 999, so number
First three bit digital of word identification field can represent field referred to as, and 5 bit digitals represent the time field of base station after can similarly limiting.
Its status data can be broadcast, determines that base station is unique as embodiment, each base station each second to survey and draw physical base station
Property typically using base station abbreviation and broadcast unique mark of the timestamp as data.The abbreviation of base station is typically all to limit character
The 4-8 positions of collection, by referred to as being decoded to base station using two methods that define, can referred to as be reflected all base station character strings as mark
Penetrate into the expression of 3-5 bit digitals, 13 bit digitals (Digital Time-stamp for being accurate to millisecond) typically limited further according to timestamp.Pass through
Above-mentioned definition three controls base station unique mark in the range of 18, effectively saves the unnecessary space of storage character string.By
According to three coding rules are defined, the field contents of original participation coding are decoded into also according to existing numeric identifier,
Save the time for searching index.
Main advantages of the present invention include:
1st, this method reduces memory space, if realizing the above method using the language of some support tail recursions,
Further improve spatial multiplex ratio and save the space complexity realized and stored.
2nd, the present invention improves operational efficiency, as described above, first encoding can improve the efficiency of Search and Orientation data, drop
Low CPU usage.Once decode simultaneously, can obtain participating in the content of coding, avoid unnecessary lookup.
3rd, the present invention is easily achieved, and reduces implementation complexity, also reduces maintenance cost.This method is based on common
Cantor expansion carrys out encoding and decoding, and method realizes that threshold is low, while is easy to test and safeguards.
Although the present invention is disclosed as above with preferred embodiment, it is not for limiting the present invention, any this area
Technical staff without departing from the spirit and scope of the present invention, may be by the methods and technical content of the disclosure above to this hair
Bright technical scheme makes possible variation and modification, therefore, every content without departing from technical solution of the present invention, according to the present invention
Any simple modifications, equivalents, and modifications made to above example of technical spirit, belong to technical solution of the present invention
Protection domain.
Claims (7)
- A kind of 1. method of base station data unique numberization mark, it is characterised in that comprise the following steps:Step 1, using the field of base station as input;Step 2, field is converted into unique Digital ID;Step 3, repeat step 2, until all fields are all converted into unique Digital ID, and by the Digital ID of all conversions It is stitched together;Step 4, Digital ID is reversible.
- A kind of 2. method of base station data unique numberization mark as claimed in claim 1, it is characterised in that the field bag Include the character string type field or numeric type field of base station.
- A kind of 3. method of base station data unique numberization mark as claimed in claim 2, it is characterised in that the character string Type field includes the title or abbreviation of base station.
- A kind of 4. method of base station data unique numberization mark as claimed in claim 2, it is characterised in that the character string The length of type field is fixed, and each character for forming character string can be exhaustive, and does not repeat.
- 5. the method for a kind of base station data unique numberization mark as claimed in claim 4, it is characterised in that pass through cantor Character string type field is converted into unique Digital ID by expansion.
- A kind of 6. method of base station data unique numberization mark as claimed in claim 1, it is characterised in that the step 2 In referred to as and broadcast timestamp by base station field be converted into unique Digital ID.
- A kind of 7. method of base station data unique numberization mark as claimed in claim 1, it is characterised in that the step 4 During Digital ID is reversible, unique Digital ID is converted into base station name and broadcasts the time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710960854.8A CN107832345A (en) | 2017-10-16 | 2017-10-16 | The method of base station data unique numberization mark |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710960854.8A CN107832345A (en) | 2017-10-16 | 2017-10-16 | The method of base station data unique numberization mark |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107832345A true CN107832345A (en) | 2018-03-23 |
Family
ID=61648129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710960854.8A Pending CN107832345A (en) | 2017-10-16 | 2017-10-16 | The method of base station data unique numberization mark |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107832345A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110647525A (en) * | 2018-12-29 | 2020-01-03 | 北京奇虎科技有限公司 | Base station data storage method and device |
CN112232025A (en) * | 2019-06-26 | 2021-01-15 | 杭州海康威视数字技术股份有限公司 | Character string storage method and device and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101043353A (en) * | 2006-03-25 | 2007-09-26 | 中兴通讯股份有限公司 | Process for improving data-handling efficiency of network management system |
CN101883111A (en) * | 2010-06-25 | 2010-11-10 | 中兴通讯股份有限公司 | Accounting server for processing online business log and method thereof |
CN102750268A (en) * | 2012-06-19 | 2012-10-24 | 山东中创软件商用中间件股份有限公司 | Object serializing method as well as object de-serializing method, device and system |
CN103279544A (en) * | 2013-06-05 | 2013-09-04 | 中国电子科技集团公司第十五研究所 | Method and device for storing and inquiring tree structure data in relational database |
CN106777292A (en) * | 2016-12-29 | 2017-05-31 | 北京神州绿盟信息安全科技股份有限公司 | A kind of Data Serialization method and device |
-
2017
- 2017-10-16 CN CN201710960854.8A patent/CN107832345A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101043353A (en) * | 2006-03-25 | 2007-09-26 | 中兴通讯股份有限公司 | Process for improving data-handling efficiency of network management system |
CN101883111A (en) * | 2010-06-25 | 2010-11-10 | 中兴通讯股份有限公司 | Accounting server for processing online business log and method thereof |
CN102750268A (en) * | 2012-06-19 | 2012-10-24 | 山东中创软件商用中间件股份有限公司 | Object serializing method as well as object de-serializing method, device and system |
CN103279544A (en) * | 2013-06-05 | 2013-09-04 | 中国电子科技集团公司第十五研究所 | Method and device for storing and inquiring tree structure data in relational database |
CN106777292A (en) * | 2016-12-29 | 2017-05-31 | 北京神州绿盟信息安全科技股份有限公司 | A kind of Data Serialization method and device |
Non-Patent Citations (1)
Title |
---|
JOSEPH GALANTE: "《Generalized Cantor Expansions》", 《ROSE-HULMAN UNDERGRADUATE MATHEMATICS JOURNAL》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110647525A (en) * | 2018-12-29 | 2020-01-03 | 北京奇虎科技有限公司 | Base station data storage method and device |
CN110647525B (en) * | 2018-12-29 | 2022-06-10 | 北京奇虎科技有限公司 | Base station data storage method and device |
CN112232025A (en) * | 2019-06-26 | 2021-01-15 | 杭州海康威视数字技术股份有限公司 | Character string storage method and device and electronic equipment |
CN112232025B (en) * | 2019-06-26 | 2023-11-03 | 杭州海康威视数字技术股份有限公司 | Character string storage method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108959386B (en) | Distributed global unique ID generation method, device, equipment and storage medium | |
CN105260354B (en) | A kind of Chinese AC automatic machines working method based on keyword dictionary tree construction | |
US4782325A (en) | Arrangement for data compression | |
US8325721B2 (en) | Method for selecting hash function, method for storing and searching routing table and devices thereof | |
CN100417028C (en) | Method of performing huffman decoding | |
CN107094021A (en) | Data compression | |
KR960703296A (en) | Method and apparatus for decoding orthogonally encoded data signals (RECEIVER FOR A DIRECT SEQUENCE SPREAD SPECTRUM ORTHOGONALLY ENCODED SIGNAL EMPLOYING RAKE PRINCIPLE) | |
CN105183788A (en) | Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree | |
CN100525450C (en) | Method and device for realizing Hoffman decodeng | |
CN107634765B (en) | A kind of Internet of Things coding method and system | |
Fraigniaud et al. | Local MST computation with short advice | |
CN104636477B (en) | The De-weight method of push list before a kind of information push | |
CN103460209A (en) | Method of encoding a data identifier | |
CN103365991A (en) | Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space | |
CN107832345A (en) | The method of base station data unique numberization mark | |
CN112256821B (en) | Chinese address completion method, device, equipment and storage medium | |
CN101551820B (en) | Generation method and apparatus for index database of points of interest attribute | |
CN100578943C (en) | Optimized Huffman decoding method and device | |
CN100498794C (en) | Method and device for compressing index | |
CN1538329A (en) | Searching method of calalogue of stored items and its device | |
Navarro et al. | Faster top-k document retrieval in optimal space | |
Navarro et al. | New space/time tradeoffs for top-k document retrieval on sequences | |
CN109446198B (en) | Trie tree node compression method and device based on double arrays | |
CN104301182B (en) | A kind of querying method and device of the exception information of website visiting at a slow speed | |
CN116208667A (en) | Variable-length high-compression bit message coding and decoding method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180323 |
|
RJ01 | Rejection of invention patent application after publication |