CN108647266A - A kind of isomeric data is quickly distributed storage, exchange method - Google Patents

A kind of isomeric data is quickly distributed storage, exchange method Download PDF

Info

Publication number
CN108647266A
CN108647266A CN201810399691.5A CN201810399691A CN108647266A CN 108647266 A CN108647266 A CN 108647266A CN 201810399691 A CN201810399691 A CN 201810399691A CN 108647266 A CN108647266 A CN 108647266A
Authority
CN
China
Prior art keywords
data
keyword
distributed storage
concordance list
isomeric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810399691.5A
Other languages
Chinese (zh)
Inventor
陈新碧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Bazemun Zhe Zhe Network Technology Co Ltd
Original Assignee
Chongqing Bazemun Zhe Zhe Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Bazemun Zhe Zhe Network Technology Co Ltd filed Critical Chongqing Bazemun Zhe Zhe Network Technology Co Ltd
Priority to CN201810399691.5A priority Critical patent/CN108647266A/en
Publication of CN108647266A publication Critical patent/CN108647266A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

A kind of isomeric data is quickly distributed storage, exchange method, data dispersion is stored in more independent equipment, using expansible system structure, shares storage load using more storage servers, this not only increases the reliability, availability and access efficiency of system, is also easy to extend;The Optimizing Queries algorithm that the present invention uses uses keyword count sort strategy, shortens query time.

Description

A kind of isomeric data is quickly distributed storage, exchange method
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of isomeric data distribution storage, real-time, interactive processing Method.
Background technology
In IT application in enterprise process, a large amount of functional application is integrated in enterprise information portal system, needs Centralized and unified management is carried out to it, to meet the needs of shared data application.But exist between a large amount of functional application more Class difference is embodied in development language, development platform, operating system, data base management system, network communication protocol etc..Its In, database difference is relatively prominent, and different system data source and application demand result in the otherness in data structure, due to Heterogeneous database is different with the mode of data sharing in data access, can not realize the Real-Time Sharing between data well, because How this, realize that isomeric data distribution storage, real-time, interactive processing are current technology problems.
Invention content
It is an object of the invention to provide a kind of isomeric datas to be quickly distributed storage, exchange method, it can solve isomery The distribution storage of data and real-time, interactive process problem realize the Real-Time Sharing between data.
It realizes, is as follows the purpose of the present invention is technical solution in this way:
1) isomeric data, deposit data center caching are split, and data class is numbered in the buffer;
2) de-redundancy processing is carried out to the floor data in caching;
3) amount for calculating each data accounts for the proportion of total amount of dataΣPi=1, wherein SiFor certain class data volume, S For total amount of data;
4) setting threshold vector P1′,P2′,...,Pi', 0 < P1' < ... < Pn' < 1, and set n1, n2..., nk's Value, wherein n1, n2..., nkIt is the integer more than 0, wherein threshold vector and niDifferent numbers is set according to actual needs And numerical values recited;
5) compare PiWith P1′,P2′,...,Pi' size:If Pi< P1', then n1Kind data deposit is same from server; If P1' < Pi< P2' then n2Kind data deposit is same from server;And so on, if Pi> Pi', then this kind of data are stored in nk It is a from server;
6) according to the storage address of distributed storage data, isomery concordance list is established;
7) inquiry request that user terminal is sent is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
9) information after inquiry is distributed in database configuration information to corresponding datum number storage is according to library, from corresponding number According to data needed for extraction in storage database;
10) by the required data summarization extracted in step 9) and user terminal is returned to.
Further, isomery concordance list is established described in step 6) to be as follows:
Keyword 6-1) is extracted to new data set, and is pre-processed to obtain inquiry meter of the keyword in the data set Number;
Keyword is sorted from small to large by query counts 6-2), and forms count table;
6-3) based on count table, structure index forms concordance list step by step, and every grade of concordance list includes corresponding keyword And its corresponding data object information;
The mapping relations for 6-4) establishing concordance list and source database can get data position letter according to index information Breath.
Further, described in step 8) according to the specific steps of the position where isomery concordance list step by step searching keyword such as Under:
8-1) keywords database of the inquiry request of user terminal and index is mapped, original inquiry, which is mapped to target, looks into It askes;
8-2) to the keyword in inquiry according to counting size sequence in count table;
8-3) the keyword being successively read from small to large in inquiry by size is counted, is looked into step by step from up to down in concordance list It askes, finds matched keyword.
Further, the required data summarization that is extracted described in step 10) and the specific method for returning to user terminal is:
Required data being extracted from corresponding data set according to data mapping relations and being summarized, the data of extraction are converted to Required data format, returns to user terminal.
Further, data class is numbered in the buffer described in step 1) and is as follows:
1-1) collected industrial system initial data is pre-processed, i.e., original floor data split, counted According to legitimate verification, the extraction of different data logic association and Data Format Transform;
1-2) pretreated floor data is stored in and is cached;
1-3) data class is numbered in the buffer.
Further, de-redundancy processing is carried out to the floor data in caching described in step 2) to be as follows:
2-1) by the way that data priority is set in advance, the non-critical information in floor data is filtered out, they are lost Abandon processing;
2-2) extract the repeated public information of floor data;
2-3) lossless compression algorithm is used to carry out compression processing to floor data.
Further, data are stored according to data temperature, diversiform data can correspond to together from server described in step 5) Back end memory space, is divided by temperature that high speed capability is small, fast capacity is medium, these three big layers of middling speed capacity by one node It is secondary;When fresh data updates, the first order is put into recent renewal by certain the number of minutes or accesses most frequent data, the second level It is put into recent renewal by certain number of days or accesses most frequent data, the third level is put into more by the time cycle arranged in advance Data new or that access is most frequent;The data temperature, the visiting frequency according to industrial process floor data and access time It determines.
By adopting the above-described technical solution, the present invention has the advantage that:
The distributed memory system of the present invention is that data dispersion is stored in more independent equipment, using expansible System structure shares storage load using more storage servers, this not only increases the reliability, availability and access of system Efficiency is also easy to extend.Real-time, interactive processing method can improve data-handling efficiency, can realize processing in real time;Using key Word count sort strategy, saves data space and calculation amount, shortens the time of index construct;It is looked into using data hierarchy It askes, efficiency data query is improved using query counts;The memory database system of structure is by memory database and data in magnetic disk Library efficiently combines, and the difference of memory database is made up with disk database, while will be interrelated between the two, is promoted entire The real-time of system and the operation load for reducing system.
Other advantages, target and the feature of the present invention will be illustrated in the following description to a certain extent, and And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke To be instructed from the practice of the present invention.The target and other advantages of the present invention can be wanted by following specification and right Book is sought to realize and obtain.
Description of the drawings
The description of the drawings of the present invention is as follows.
Fig. 1 is the configuration diagram of the present invention;
Fig. 2 is that Stored Procedure schematic diagram is shown in present invention distribution.
Specific implementation mode
The invention will be further described with reference to the accompanying drawings and examples.
A kind of isomeric data is quickly distributed storage, exchange method, is as follows:
1) isomeric data, deposit data center caching are split, and data class is numbered in the buffer;
2) de-redundancy processing is carried out to the floor data in caching;
3) amount for calculating each data accounts for the proportion of total amount of dataΣPi=1, wherein SiFor certain class data volume, S For total amount of data;
4) setting threshold vector P1′,P2′,...,Pi', 0 < P1' < ... < Pn' < 1, and set n1, n2..., nk's Value, wherein n1, n2..., nkIt is the integer more than 0, wherein threshold vector and niDifferent numbers is set according to actual needs And numerical values recited;
5) compare PiWith P1′,P2′,...,Pi' size:If Pi< P1', then n1Kind data deposit is same from server; If P1' < Pi< P2', then n2Kind data deposit is same from server;And so on, if Pi> Pi', then this kind of data are stored in nk It is a from server;
6) according to the storage address of distributed storage data, isomery concordance list is established;
Isomery concordance list is established to be as follows:
Keyword 6-1) is extracted to new data set, obtains keyword set;
Each keyword in keyword set is scanned on new data set 6-2), obtains the inquiry meter of keyword Number;
Keyword is sorted from small to large by query counts 6-3), and gives each keyword label in order;
6-4) according to keyword counting sequence builds last layer node, and structure index forms concordance list, every grade of concordance list step by step Including corresponding keyword and its corresponding data object information;
The mapping relations for 6-5) establishing concordance list and source database can get data position letter according to index information Breath.
7) inquiry request is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
Position where searching keyword is as follows:
8-1) keywords database of the searching keyword of user terminal and index is mapped, original inquiry, which is mapped to target, looks into It askes;
Count table 8-2) is retrieved, keyword query counting sequence number is obtained;
Keyword 8-3) being successively read from small to large by counting sequence number in inquiry, in concordance list from up to down step by step Inquiry, finds matched keyword.
7) inquiry request that user terminal is sent is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
9) information after inquiry is distributed in database configuration information to corresponding datum number storage is according to library, from corresponding number According to data needed for extraction in storage database;
10) by the required data summarization extracted in step 9) and user terminal is returned to;
It is as follows:
10-1) by the data summarization of extraction, and extensible markup language is used to encapsulate data for the document of unified format, Return to user terminal;
10-2) user terminal parses document content, and is converted to required data format.
Optimizing Queries algorithm of the present invention uses keyword count sort strategy, shortens query time;To isomeric data into Row is split and distributed storage, improves data processing speed.
Finally illustrate, the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although with reference to compared with Good embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the skill of the present invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of the technical program, should all be covered in the present invention Right in.

Claims (7)

1. a kind of isomeric data is quickly distributed storage, exchange method, which is characterized in that be as follows:
1) isomeric data, deposit data center caching are split, and data class is numbered in the buffer;
2) de-redundancy processing is carried out to the floor data in caching;
3) amount for calculating each data accounts for the proportion of total amount of data∑Pi=1, wherein SiFor certain class data volume, S is total Data volume;
4) setting threshold vector P '1,P′2,...,P′i, 0 < P '1< ... < P 'n< 1, and set n1, n2..., nkValue, Middle n1, n2..., nkIt is the integer more than 0, wherein threshold vector and niDifferent numbers and numerical value are set according to actual needs Size;
5) compare PiWith P '1,P′2,...,P′iSize:If Pi< P '1, then n1Kind data deposit is same from server;If P '1 < Pi< P '2, then n2Kind data deposit is same from server;And so on, if Pi> P 'i, then this kind of data be stored in nkIt is a from In server;
6) according to the storage address of distributed storage data, isomery concordance list is established;
7) inquiry request that user terminal is sent is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
9) information after inquiry corresponding datum number storage in database configuration information is distributed to deposit from corresponding data according to library Store up data needed for being extracted in database;
10) by the required data summarization extracted in step 9) and user terminal is returned to.
2. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 6) The isomery concordance list of establishing is as follows:
Keyword 6-1) is extracted to new data set, and is pre-processed to obtain query counts of the keyword in the data set;
Keyword is sorted from small to large by query counts 6-2), and forms count table;
6-3) based on count table, structure index forms concordance list step by step, every grade of concordance list include corresponding keyword and its Corresponding data object information;
The mapping relations for 6-4) establishing concordance list and source database can get data position information according to index information.
3. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 8) The position according to where isomery concordance list step by step searching keyword is as follows:
8-1) keywords database of the inquiry request of user terminal and index is mapped, original inquiry is mapped to target query;
8-2) to the keyword in inquiry according to counting size sequence in count table;
8-3) the keyword being successively read from small to large in inquiry by size is counted, is inquired step by step from up to down in concordance list, Find matched keyword.
4. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 10) The required data summarization of the extraction and specific method for returning to user terminal is:
Required data are extracted from corresponding data set according to data mapping relations and summarized, the data of extraction are converted to required Data format, return to user terminal.
5. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 1) Described data class is numbered in the buffer is as follows:
1-1) collected industrial system initial data is pre-processed, i.e., original floor data is split, data are closed Method verification, the extraction of different data logic association and Data Format Transform;
1-2) pretreated floor data is stored in and is cached;
1-3) data class is numbered in the buffer.
6. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 2) Floor data in described pair of caching carries out de-redundancy processing and is as follows:
2-1) by the way that data priority is set in advance, the non-critical information in floor data is filtered out, they are carried out at discarding Reason;
2-2) extract the repeated public information of floor data;
2-3) lossless compression algorithm is used to carry out compression processing to floor data.
7. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 5) Described to store data according to data temperature from server, diversiform data can correspond to same node, back end be stored empty Between be divided into that high speed capability is small, fast capacity is medium, these three big levels of middling speed capacity by temperature;When fresh data updates, first Grade is put into recent renewal by certain the number of minutes or accesses most frequent data, and the second level is put into recently more by certain number of days The most frequent data third level is newly either accessed to be put into update by the time cycle arranged in advance or access most frequent number According to;The data temperature is determined according to the visiting frequency of industrial process floor data and access time.
CN201810399691.5A 2018-04-28 2018-04-28 A kind of isomeric data is quickly distributed storage, exchange method Withdrawn CN108647266A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810399691.5A CN108647266A (en) 2018-04-28 2018-04-28 A kind of isomeric data is quickly distributed storage, exchange method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810399691.5A CN108647266A (en) 2018-04-28 2018-04-28 A kind of isomeric data is quickly distributed storage, exchange method

Publications (1)

Publication Number Publication Date
CN108647266A true CN108647266A (en) 2018-10-12

Family

ID=63748529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810399691.5A Withdrawn CN108647266A (en) 2018-04-28 2018-04-28 A kind of isomeric data is quickly distributed storage, exchange method

Country Status (1)

Country Link
CN (1) CN108647266A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492004A (en) * 2018-10-29 2019-03-19 广东开放大学(广东理工职业学院) A kind of number fishery isomeric data storage method, system and device
CN111026721A (en) * 2019-11-12 2020-04-17 上海麦克风文化传媒有限公司 Temperature data storage method
CN113254427A (en) * 2021-07-15 2021-08-13 深圳市同富信息技术有限公司 Database expansion method and device
CN115934794A (en) * 2022-11-30 2023-04-07 二十一世纪空间技术应用股份有限公司 Elastic management method for mass multi-source heterogeneous remote sensing space data query
CN116303833A (en) * 2023-05-18 2023-06-23 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492004A (en) * 2018-10-29 2019-03-19 广东开放大学(广东理工职业学院) A kind of number fishery isomeric data storage method, system and device
CN111026721A (en) * 2019-11-12 2020-04-17 上海麦克风文化传媒有限公司 Temperature data storage method
CN113254427A (en) * 2021-07-15 2021-08-13 深圳市同富信息技术有限公司 Database expansion method and device
CN115934794A (en) * 2022-11-30 2023-04-07 二十一世纪空间技术应用股份有限公司 Elastic management method for mass multi-source heterogeneous remote sensing space data query
CN116303833A (en) * 2023-05-18 2023-06-23 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method
CN116303833B (en) * 2023-05-18 2023-07-21 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method

Similar Documents

Publication Publication Date Title
Li et al. Packet forwarding in named data networking requirements and survey of solutions
CN108647266A (en) A kind of isomeric data is quickly distributed storage, exchange method
US10958752B2 (en) Providing access to managed content
US10423626B2 (en) Systems and methods for data conversion and comparison
Quan et al. TB2F: Tree-bitmap and bloom-filter for a scalable and efficient name lookup in content-centric networking
CN104820714B (en) Magnanimity tile small documents memory management method based on hadoop
CN100505762C (en) Distributed multi-stage buffer storage system suitable for object network storage
CN102819586B (en) A kind of URL sorting technique based on high-speed cache and equipment
KR20200053512A (en) KVS tree database
US20130191523A1 (en) Real-time analytics for large data sets
US9129010B2 (en) System and method of partitioned lexicographic search
CN108140040A (en) The selective data compression of database in memory
CN102971732A (en) System architecture for integrated hierarchical query processing for key/value stores
US9262511B2 (en) System and method for indexing streams containing unstructured text data
CN102054000A (en) Data querying method, device and system
US20160092507A1 (en) Optimizing a query with extrema function using in-memory data summaries on the storage server
CN110765138A (en) Data query method, device, server and storage medium
CN106649150A (en) Cache management method and device
CN109246102B (en) System and method for supporting large-scale authentication data rapid storage and retrieval
CN108509585A (en) A kind of isomeric data real-time, interactive optimized treatment method
CN113722274A (en) Efficient R-tree index remote sensing data storage model
CN114648010A (en) Data table standardization method, device, equipment and computer storage medium
CN112214460A (en) High-performance storage control method based on distributed large-capacity fragmentation
KR20120085375A (en) Analysis system for log data
Bai et al. An efficient skyline query algorithm in the distributed environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20181012