CN107832345A - The method of base station data unique numberization mark - Google Patents

The method of base station data unique numberization mark Download PDF

Info

Publication number
CN107832345A
CN107832345A CN201710960854.8A CN201710960854A CN107832345A CN 107832345 A CN107832345 A CN 107832345A CN 201710960854 A CN201710960854 A CN 201710960854A CN 107832345 A CN107832345 A CN 107832345A
Authority
CN
China
Prior art keywords
base station
unique
mark
field
digital
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710960854.8A
Other languages
Chinese (zh)
Inventor
万景琨
高山岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxun Position Network Co Ltd
Original Assignee
Qianxun Position Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxun Position Network Co Ltd filed Critical Qianxun Position Network Co Ltd
Priority to CN201710960854.8A priority Critical patent/CN107832345A/en
Publication of CN107832345A publication Critical patent/CN107832345A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a kind of method of base station data unique numberization mark, comprise the following steps:Step 1, according to the title of base station or referred to as and can as unique identification field as input, if the character that field type field must be regular length and each composition character string can exhaustion restriction character and not reproducible;Step 2, unique numeral is converted into for the character string of regular length according to field;Step 3, repeat step 2, until all String fields all change into unique numeral, and the digital jointing of all conversions is got up;Step 4, Digital ID is reversible, unique mark can be converted into base station name and time of origin if searching visualization requirement.The present invention is easy to implement, and effectively improves recall precision.Only in all kinds of mapping base stations, timing uploads the data set to the present invention, reduces the shared flow of repetition upload and data storage brings unnecessary space waste, while reduce the time loss in retrieving.

Description

The method of base station data unique numberization mark
Technical field
The present invention relates to technical field of software development, and in particular to big data memory scan cleaning technique field.
Background technology
In recent years, as the development of technology, the mankind are increasingly urgent to the demand of all kinds of precise positioning services.Each traditional base Data volume is uploaded when standing firm to explode.Big data now is required the unique identification of magnanimity base station in data cleansing searching field Compare it is high, conventional characters string unique mark not only expended in retrieving cpu resource simultaneously recall precision it is low, even if now To the mode that character string unique mark indexes as data volume explodes in traditional database, index cost also sharply increases.Very Base station data is represented using Digital ID completely more, because reversible in mark the letter that was originally contained of base station can not be represented Breath, cause to need to inquire about the useful information of other relation table acquisition in retrieval, this virtually improves retrieval cost and reduction Recall precision.
The content of the invention
Present invention solves the technical problem that it is exactly that traditional base station Data Identification is changed into unique and reversible numeral to mark Know, encoded message digit in mark by once identifying, while Gray code can equally reduce and participate in unique mark base station Essential information.
The technical solution adopted by the present invention is as follows:
A kind of method of base station data unique numberization mark, comprises the following steps:
Step 1, according to the title of physical base station or referred to as and can as unique identification field as input, if It is character string type field, it is necessary to which the character for being regular length and each forming character string can be exhaustive, and restriction character can not Repeat.
Step 2, according to the character string type field that field is regular length, it is converted into unique Digital ID.
Step 3, repeat step 2, until all String fields all change into unique Digital ID, and by all conversions Digital ID is stitched together.
Step 4, Digital ID is reversible, unique mark can be converted into base station name if searching visualization requirement With broadcast the time.
Beneficial effects of the present invention are as follows:
1st, recall precision is improved, and compared to the retrieval of traditional database, takes out Digital ID while needs to do additionally to look into Look for;The present invention can realize once to search and just can all be retrieved the information of main uniqueness.
2nd, storage efficiency is improved, passes through the coded system of length-specific, it is possible to reduce character string type unique mark is deposited Store up space.
Brief description of the drawings
Fig. 1 is schematic flow sheet of the present invention.
Embodiment
Hereinafter, the present invention is further elaborated in conjunction with the accompanying drawings and embodiments.Fig. 1 be a kind of base station data of the present invention only The method flow schematic diagram of one digital representation, comprises the following steps:
Define one:Input data determines:The regular length character string field or numeric type field that any base station defines It can serve as inputting.
Define two:String field changes unique numerical identification:The character string fixed for length, the type base of character string This judgement can be exhaustive, is described below and assumes that field character string is the restriction regular length character mark chosen in (a-z or A-Z) Show.Determine length and can limit character String field change into unique numerical identification may be considered by these numeral In the fully intermeshing of regular length, and with this numeral come to each fully intermeshing element numerals.Specific conversion formula is as follows: (cantor deploys, and cantor expansion is exactly a kind of special hash function, and its use range is the arrangement for some numbers The compression and storage of carry out state):
X=an* (n-1)!+an-1*(n-2)!+...+ai*(i-1)!+...+a2*1!+al*0!
Wherein, it is to come which (since 0) in the current element not occurred that an, which is, and n is the length of fixed character string.
For example:
For the base coded that regular length is 4 " SFGA ", its conversion formula is:
X (" SFGA ")=a4*3!+a3*2!+a2*1!+a1*0!
Which big element a4=" S " this element is in the array [S, F, G, A] of restriction.Compared by ascii table Understand, S is the 3rd big element (being calculated since 0).So a4=3.A3=" F ", because the character smaller than F has 1, so a3 =1.A2=" G ", the member smaller than G are known as 2, but because previous element F had occurred, a2=1
X (" SFGA ")=3*3!+1*2!+1*1!+0*0!=18+2+1=21
Define three:Unique identification number is reversible:For given numeral 21, exhaustion is combined in [S, F, G, A]:
1) 20 are obtained with 21-1 first, illustrate there are 20 arrangements before given character string (this number is subtracted 1 in itself).
2) 3 are removed with 20!More than 32 are obtained, illustrates to have that 3 numbers are smaller than the 1st, so first is G.
3) 2 are removed with 2!More than 10 is obtained, illustrates to have that 1 number is smaller than the 2nd, so being F.
4) 1 is removed with 1!Obtain more than 10, similarly, illustrate to have that 1 number is smaller than the 3rd, in remaining character string dimension A and Can only be element G in G.
5) last remaining element can only be A.
So this character string is SFGA.
Define four:Multiple character string unique numerical identification splicings:Multiple Digital IDs can be by each field regular coding To be spliced into new unique character string.For example, there are two String fields:Base station time and base station code field, Two fields can be converted into Digital ID according to defining two, while the numeral of maximum is estimated that according to the character string of restriction Digit, such as, the exhaustive array [S, F, G, A] of base station abbreviation field, maximum Digital ID can be defined to 999, so number First three bit digital of word identification field can represent field referred to as, and 5 bit digitals represent the time field of base station after can similarly limiting.
Its status data can be broadcast, determines that base station is unique as embodiment, each base station each second to survey and draw physical base station Property typically using base station abbreviation and broadcast unique mark of the timestamp as data.The abbreviation of base station is typically all to limit character The 4-8 positions of collection, by referred to as being decoded to base station using two methods that define, can referred to as be reflected all base station character strings as mark Penetrate into the expression of 3-5 bit digitals, 13 bit digitals (Digital Time-stamp for being accurate to millisecond) typically limited further according to timestamp.Pass through Above-mentioned definition three controls base station unique mark in the range of 18, effectively saves the unnecessary space of storage character string.By According to three coding rules are defined, the field contents of original participation coding are decoded into also according to existing numeric identifier, Save the time for searching index.
Main advantages of the present invention include:
1st, this method reduces memory space, if realizing the above method using the language of some support tail recursions, Further improve spatial multiplex ratio and save the space complexity realized and stored.
2nd, the present invention improves operational efficiency, as described above, first encoding can improve the efficiency of Search and Orientation data, drop Low CPU usage.Once decode simultaneously, can obtain participating in the content of coding, avoid unnecessary lookup.
3rd, the present invention is easily achieved, and reduces implementation complexity, also reduces maintenance cost.This method is based on common Cantor expansion carrys out encoding and decoding, and method realizes that threshold is low, while is easy to test and safeguards.
Although the present invention is disclosed as above with preferred embodiment, it is not for limiting the present invention, any this area Technical staff without departing from the spirit and scope of the present invention, may be by the methods and technical content of the disclosure above to this hair Bright technical scheme makes possible variation and modification, therefore, every content without departing from technical solution of the present invention, according to the present invention Any simple modifications, equivalents, and modifications made to above example of technical spirit, belong to technical solution of the present invention Protection domain.

Claims (7)

  1. A kind of 1. method of base station data unique numberization mark, it is characterised in that comprise the following steps:
    Step 1, using the field of base station as input;
    Step 2, field is converted into unique Digital ID;
    Step 3, repeat step 2, until all fields are all converted into unique Digital ID, and by the Digital ID of all conversions It is stitched together;
    Step 4, Digital ID is reversible.
  2. A kind of 2. method of base station data unique numberization mark as claimed in claim 1, it is characterised in that the field bag Include the character string type field or numeric type field of base station.
  3. A kind of 3. method of base station data unique numberization mark as claimed in claim 2, it is characterised in that the character string Type field includes the title or abbreviation of base station.
  4. A kind of 4. method of base station data unique numberization mark as claimed in claim 2, it is characterised in that the character string The length of type field is fixed, and each character for forming character string can be exhaustive, and does not repeat.
  5. 5. the method for a kind of base station data unique numberization mark as claimed in claim 4, it is characterised in that pass through cantor Character string type field is converted into unique Digital ID by expansion.
  6. A kind of 6. method of base station data unique numberization mark as claimed in claim 1, it is characterised in that the step 2 In referred to as and broadcast timestamp by base station field be converted into unique Digital ID.
  7. A kind of 7. method of base station data unique numberization mark as claimed in claim 1, it is characterised in that the step 4 During Digital ID is reversible, unique Digital ID is converted into base station name and broadcasts the time.
CN201710960854.8A 2017-10-16 2017-10-16 The method of base station data unique numberization mark Pending CN107832345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710960854.8A CN107832345A (en) 2017-10-16 2017-10-16 The method of base station data unique numberization mark

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710960854.8A CN107832345A (en) 2017-10-16 2017-10-16 The method of base station data unique numberization mark

Publications (1)

Publication Number Publication Date
CN107832345A true CN107832345A (en) 2018-03-23

Family

ID=61648129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710960854.8A Pending CN107832345A (en) 2017-10-16 2017-10-16 The method of base station data unique numberization mark

Country Status (1)

Country Link
CN (1) CN107832345A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647525A (en) * 2018-12-29 2020-01-03 北京奇虎科技有限公司 Base station data storage method and device
CN112232025A (en) * 2019-06-26 2021-01-15 杭州海康威视数字技术股份有限公司 Character string storage method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101043353A (en) * 2006-03-25 2007-09-26 中兴通讯股份有限公司 Process for improving data-handling efficiency of network management system
CN101883111A (en) * 2010-06-25 2010-11-10 中兴通讯股份有限公司 Accounting server for processing online business log and method thereof
CN102750268A (en) * 2012-06-19 2012-10-24 山东中创软件商用中间件股份有限公司 Object serializing method as well as object de-serializing method, device and system
CN103279544A (en) * 2013-06-05 2013-09-04 中国电子科技集团公司第十五研究所 Method and device for storing and inquiring tree structure data in relational database
CN106777292A (en) * 2016-12-29 2017-05-31 北京神州绿盟信息安全科技股份有限公司 A kind of Data Serialization method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101043353A (en) * 2006-03-25 2007-09-26 中兴通讯股份有限公司 Process for improving data-handling efficiency of network management system
CN101883111A (en) * 2010-06-25 2010-11-10 中兴通讯股份有限公司 Accounting server for processing online business log and method thereof
CN102750268A (en) * 2012-06-19 2012-10-24 山东中创软件商用中间件股份有限公司 Object serializing method as well as object de-serializing method, device and system
CN103279544A (en) * 2013-06-05 2013-09-04 中国电子科技集团公司第十五研究所 Method and device for storing and inquiring tree structure data in relational database
CN106777292A (en) * 2016-12-29 2017-05-31 北京神州绿盟信息安全科技股份有限公司 A kind of Data Serialization method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JOSEPH GALANTE: "《Generalized Cantor Expansions》", 《ROSE-HULMAN UNDERGRADUATE MATHEMATICS JOURNAL》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647525A (en) * 2018-12-29 2020-01-03 北京奇虎科技有限公司 Base station data storage method and device
CN110647525B (en) * 2018-12-29 2022-06-10 北京奇虎科技有限公司 Base station data storage method and device
CN112232025A (en) * 2019-06-26 2021-01-15 杭州海康威视数字技术股份有限公司 Character string storage method and device and electronic equipment
CN112232025B (en) * 2019-06-26 2023-11-03 杭州海康威视数字技术股份有限公司 Character string storage method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN108959386B (en) Distributed global unique ID generation method, device, equipment and storage medium
CN105260354B (en) A kind of Chinese AC automatic machines working method based on keyword dictionary tree construction
US4782325A (en) Arrangement for data compression
US8325721B2 (en) Method for selecting hash function, method for storing and searching routing table and devices thereof
CN100417028C (en) Method of performing huffman decoding
CN107094021A (en) Data compression
KR960703296A (en) Method and apparatus for decoding orthogonally encoded data signals (RECEIVER FOR A DIRECT SEQUENCE SPREAD SPECTRUM ORTHOGONALLY ENCODED SIGNAL EMPLOYING RAKE PRINCIPLE)
CN105183788A (en) Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree
CN100525450C (en) Method and device for realizing Hoffman decodeng
CN107634765B (en) A kind of Internet of Things coding method and system
Fraigniaud et al. Local MST computation with short advice
CN104636477B (en) The De-weight method of push list before a kind of information push
CN103460209A (en) Method of encoding a data identifier
CN103365991A (en) Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space
CN107832345A (en) The method of base station data unique numberization mark
CN112256821B (en) Chinese address completion method, device, equipment and storage medium
CN101551820B (en) Generation method and apparatus for index database of points of interest attribute
CN100578943C (en) Optimized Huffman decoding method and device
CN100498794C (en) Method and device for compressing index
CN1538329A (en) Searching method of calalogue of stored items and its device
Navarro et al. Faster top-k document retrieval in optimal space
Navarro et al. New space/time tradeoffs for top-k document retrieval on sequences
CN109446198B (en) Trie tree node compression method and device based on double arrays
CN104301182B (en) A kind of querying method and device of the exception information of website visiting at a slow speed
CN116208667A (en) Variable-length high-compression bit message coding and decoding method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180323

RJ01 Rejection of invention patent application after publication