CN106599178A - Big data processing method capable of realizing quick search and supporting distributed storage - Google Patents
Big data processing method capable of realizing quick search and supporting distributed storage Download PDFInfo
- Publication number
- CN106599178A CN106599178A CN201611142025.0A CN201611142025A CN106599178A CN 106599178 A CN106599178 A CN 106599178A CN 201611142025 A CN201611142025 A CN 201611142025A CN 106599178 A CN106599178 A CN 106599178A
- Authority
- CN
- China
- Prior art keywords
- data
- storage
- cryptographic hash
- values
- big data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
Abstract
The invention relates to the technical field of big data storage, in particular to a big data processing method capable of realizing quick search and supporting distributed storage. According to the method, characteristics in a process are accurately searched for by analyzing big data volume; accurate locating during big data accurate search is supported by performing MD5 and consistency hash calculation on accurate data and adding MD5 fields and hash fields, so that data with small correlation is filtered out; the data is searched for in a relatively small space, so that the efficiency of big data volume accurate search is improved; and multi-file or multi-server distributed storage can be performed according to different hash values through storage definition of the hash fields, so that the storage space utilization of the big data volume is increased, the data storage balanced loading is realized, and the pressure of a storage server is reduced. Through the method, in a specific scene needed to be subjected to data accurate acquisition, the storage efficiency of the big data volume can be improved, and a quick and accurate acquisition method can be provided, so that the big data search efficiency is greatly improved.
Description
Technical field
The present invention relates to big data technical field of memory, especially a kind of achievable fast searching and support be distributed storage
Big data processing method.
Background technology
With the development of computer ecommerce, the data that application program is produced are more and more, and the data volume of application,
Concurrency is also increasing, for example, carry out situations such as accurate commodity searching, mobile phone location positioning, inspection of network connection,
Unit interval domestic demand rapidly obtains the information of the data specified in substantial amounts of data.For general big data storage method,
Quickly found in substantial amounts of data and navigated in specific data, big data need to be traveled through, be exactly phase
The index of pass is guided, but safeguards that massive index is also a hard work when data increase, change, deleting, can be very big
Affect the storage of data and the efficiency for reading, it is impossible to meet the request of big data quantity, high concurrent well, cause application program to exist
Bottleneck in operation.
The content of the invention
Present invention solves the technical problem that being to provide a kind of achievable fast searching and support to be distributed the big data of storage
Processing method;Find and support that carrying out distribution deposits for fast and accurately data are carried out on the memory space of big data quantity
Storage.
The present invention solves the technical scheme of above-mentioned technical problem:
Described method includes following step:
Step 1:Data one by one to being stored are carried out the extraction of feature by certain algorithm, and acquisition can determine specific
The unique features of data are used for the calculating of follow-up data value, and form the method that can quickly carry out data characteristicses extraction, are used for
Use when data storage and reading;
Step 2:From the feature that data one by one are extracted, the calculating of MD5 values is carried out, draw MD5 values, then calculated by Hash
Method, calculates the cryptographic Hash from 1 to N, and the size of N carries out value by the distributed storage of specific data volume size and division;
Step 3:The storage organization of design data, except the space for having data storage, the also space of MD5 values and cryptographic Hash
Space, cryptographic Hash has the data of identical cryptographic Hash for directly hitting, and MD5 values are accurate for determining in identical cryptographic Hash
Data;
Step 4:The feature of data when reading data, is extracted, and calculates MD5 values and cryptographic Hash, filtered by cryptographic Hash
Fall most data, and accurate data value is determined by MD5 from the data value of small range.
To the eigenvalue for extracting, the calculating of MD5 is carried out, after the MD5 to eigenvalue is calculated, to MD5 Hash meters
Calculate, draw cryptographic Hash, so that substantial amounts of data carry out distributed storage by the cryptographic Hash for calculating;
When storage is with reading, MD5 values and cryptographic Hash are calculated according to unified method.
Select can technology carry out the middleware of subregion or distributed structure/architecture as memory space;When memory space is set up,
Partitioned file or distributed server architecture are set up by cryptographic Hash, so as to ensure that big data storage and reading process separate
Reading, equally loaded;
When data are on the memory space of storage to design, data, MD5 values, cryptographic Hash are preserved together, storage is empty
Between by design storage logic store the data to specific storage file or storage server.
Described sets up partitioned file or distributed server architecture by cryptographic Hash, and the process of foundation adopts conforming Kazakhstan
Uncommon algorithm.
In digital independent, by the cryptographic Hash calculated, subregion or distributed server storage are being carried out
Spatially, it is determined that file or server on identical cryptographic Hash is read out;
The data of identical cryptographic Hash are read out, then is contrasted by MD5 values, obtain out identical MD5 value, so as to quick
Search out the data of needs.
The invention has the beneficial effects as follows:
Method by analyze big data quantity accurately found during the characteristics of, by carrying out MD5 to accurate data
And conforming Hash calculation, and by increasing MD5 fields and Hash field come accurately fixed when supporting that big data is accurately found
Position, the data little so as to filter out dependency, the searching data in the relatively small space are accurately searched so as to improve big data quantity
Efficiency;Simultaneously defined by the storage to Hash field, can by different cryptographic Hash carry out multifile or multiserver point
Cloth is stored, and so as to the memory space for improving big data quantity is utilized, is accomplished data storage equally loaded, is reduced storage server
Pressure.
Description of the drawings
The present invention is further described below in conjunction with the accompanying drawings:
Accompanying drawing 1 is the flow chart of computer software functional unit of the present invention.
Specific embodiment
As shown in figure 1, method of the present invention implementation steps are as follows:
Step 1:On the Storage Middleware Applying of data, the memory space of setting data, MD5 memory spaces, cryptographic Hash storage
Space, and the table subregion or distributed server design Storage of memory space are carried out by cryptographic Hash, by the side of concordance Hash
Method carries out design Storage;
Step 2:Specific data characteristicses extracting method is defined, data to be increased are carried out carrying for feature by method one by one
Take;
Step 3:From the feature that data one by one are extracted, the calculating of MD5 values is carried out, draw MD5 values, then calculated by Hash
Method, calculates the cryptographic Hash from 1 to N;
Step 4:Data, MD5 values, cryptographic Hash are saved on memory space, Storage Middleware Applying is automatically by the scope of design
The single cent part or sub-server that data are carried out by cryptographic Hash is preserved;
Step 5:When reading data, data to be read are carried out with feature extraction by method first, and is calculated MD5 values and is breathed out
Uncommon value, reads the data of identical cryptographic Hash from Storage Middleware Applying by cryptographic Hash, and Storage Middleware Applying navigates to data by cryptographic Hash
The file or server of storage, so as to read the data of peek amount very little, and compares identical MD5 data, and returns what is specified
Data message.
It is described to design concretely comprising the following steps for Storage Middleware Applying concordance Hash table:
Step one, the available Storage Middleware Applying of selection, using middlewares such as conventional Mysql or MongoDB;
Step 2, in storage between design memory space on part, and be designed with the space of data, MD5, cryptographic Hash, be used for
The storage of data;
Step 3, by the scope of cryptographic Hash, partition holding of the design data by cryptographic Hash, such as by the data per 1,000,000
Amount can so design a data space in a balanced way as a memory space.
The feature of the extracted data is concretely comprised the following steps:
The clear and definite feature of step one, data inherently, then can be directly as data characteristicses, such as network address;
Step 2, data generation time can be as data characteristicses, then using the time as data characteristicses;
Step 3, the equipment of data as data characteristicses, then using the unique mark of equipment as data characteristicses, such as mobile phone
Number etc.;
Step 4, cannot be used as data characteristicses for unique mark, can be by assemblage characteristic as mark, such as equipment
+ the time.
For the key point of the fast searching method of the particular data based on big data, can be extracted from data one by one
Go out clear and definite feature, a data can propose multiple features, the feature for proposing out need to be unique, by can be from the method
Positioning searching is rapidly carried out, the data of needs are quickly found out.
The logic of subregion or distributed server is set up by cryptographic Hash by using specific data storage middleware, is passed through
This mode is stored come the classification for carrying out data, reduces the load to big file or server, so as to improve the storage of big data
With the efficiency for reading.
Claims (7)
1. a kind of achievable fast searching and support be distributed storage big data processing method, it is characterised in that:Described method
Including following step:
Step 1:Data one by one to being stored are carried out the extraction of feature by certain algorithm, and acquisition can determine particular data
Unique features be used for follow-up data value calculating, and formed can quickly carry out data characteristicses extraction method, for data
Use when storage and reading;
Step 2:From the feature that data one by one are extracted, the calculating of MD5 values is carried out, MD5 values are drawn, then by hash algorithm,
The cryptographic Hash from 1 to N is calculated, the size of N carries out value by the distributed storage of specific data volume size and division;
Step 3:The storage organization of design data, except the space for having data storage, the also sky of the space of MD5 values and cryptographic Hash
Between, cryptographic Hash has the data of identical cryptographic Hash for directly hitting, and MD5 values accurately count for determining in identical cryptographic Hash
According to;
Step 4:The feature of data when reading data, is extracted, and calculates MD5 values and cryptographic Hash, filtered out greatly by cryptographic Hash
Partial data, and accurate data value is determined by MD5 from the data value of small range.
2. method according to claim 1, it is characterised in that:
To the eigenvalue for extracting, the calculating of MD5 is carried out, after the MD5 to eigenvalue is calculated, to MD5 Hash calculations, obtained
Go out cryptographic Hash, so that substantial amounts of data carry out distributed storage by the cryptographic Hash for calculating;
When storage is with reading, MD5 values and cryptographic Hash are calculated according to unified method.
3. method according to claim 1, it is characterised in that:
Select can technology carry out the middleware of subregion or distributed structure/architecture as memory space;When memory space is set up, by Kazakhstan
Uncommon value sets up partitioned file or distributed server architecture, so as to ensure big data storage and the separate reading of reading process
Take, equally loaded;
When data are on the memory space of storage to design, data, MD5 values, cryptographic Hash are preserved together, memory space is pressed
The storage logic of design stores the data to specific storage file or storage server.
4. method according to claim 2, it is characterised in that:
Select can technology carry out the middleware of subregion or distributed structure/architecture as memory space;When memory space is set up, by Kazakhstan
Uncommon value sets up partitioned file or distributed server architecture, so as to ensure big data storage and the separate reading of reading process
Take, equally loaded;
When data are on the memory space of storage to design, data, MD5 values, cryptographic Hash are preserved together, memory space is pressed
The storage logic of design stores the data to specific storage file or storage server.
5. the method according to claim 3 or 4, it is characterised in that:Described sets up partitioned file or distribution by cryptographic Hash
Formula server architecture, the process of foundation adopt conforming hash algorithm.
6. the method according to any one of Claims 1-4, it is characterised in that:
In digital independent, by the cryptographic Hash calculated, in the space for having carried out subregion or distributed server storage
On, it is determined that file or server on identical cryptographic Hash is read out;
The data of identical cryptographic Hash are read out, then is contrasted by MD5 values, obtain out identical MD5 value, so as to fast searching
To the data for needing.
7. method according to claim 5, it is characterised in that:
In digital independent, by the cryptographic Hash calculated, in the space for having carried out subregion or distributed server storage
On, it is determined that file or server on identical cryptographic Hash is read out;
The data of identical cryptographic Hash are read out, then is contrasted by MD5 values, obtain out identical MD5 value, so as to fast searching
To the data for needing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611142025.0A CN106599178B (en) | 2016-12-12 | 2016-12-12 | A kind of big data processing method that can be achieved quickly to find and distribution is supported to store |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611142025.0A CN106599178B (en) | 2016-12-12 | 2016-12-12 | A kind of big data processing method that can be achieved quickly to find and distribution is supported to store |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599178A true CN106599178A (en) | 2017-04-26 |
CN106599178B CN106599178B (en) | 2019-08-30 |
Family
ID=58597641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611142025.0A Active CN106599178B (en) | 2016-12-12 | 2016-12-12 | A kind of big data processing method that can be achieved quickly to find and distribution is supported to store |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599178B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108153838A (en) * | 2017-12-15 | 2018-06-12 | 济南中维世纪科技有限公司 | A kind of MySQL database middleware preprocess method |
CN111258966A (en) * | 2020-01-14 | 2020-06-09 | 软通动力信息技术有限公司 | Data deduplication method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120166403A1 (en) * | 2010-12-24 | 2012-06-28 | Kim Mi-Jeom | Distributed storage system having content-based deduplication function and object storing method |
CN104239572A (en) * | 2014-09-30 | 2014-12-24 | 普元信息技术股份有限公司 | System and method for achieving metadata analysis based on distributed cache |
-
2016
- 2016-12-12 CN CN201611142025.0A patent/CN106599178B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120166403A1 (en) * | 2010-12-24 | 2012-06-28 | Kim Mi-Jeom | Distributed storage system having content-based deduplication function and object storing method |
CN104239572A (en) * | 2014-09-30 | 2014-12-24 | 普元信息技术股份有限公司 | System and method for achieving metadata analysis based on distributed cache |
Non-Patent Citations (2)
Title |
---|
PEILUN LI ET AL: "Optimizing Hash-based Distributed Storage Using Client Choices", 《APSYS "16 PROCEEDINGS OF THE 7TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS》 * |
黄秋兰 等: "分布式存储系统的哈希算法研究", 《计算机工程与应用》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108153838A (en) * | 2017-12-15 | 2018-06-12 | 济南中维世纪科技有限公司 | A kind of MySQL database middleware preprocess method |
CN108153838B (en) * | 2017-12-15 | 2022-03-11 | 山东中维世纪科技股份有限公司 | MySQL database middleware preprocessing method |
CN111258966A (en) * | 2020-01-14 | 2020-06-09 | 软通动力信息技术有限公司 | Data deduplication method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106599178B (en) | 2019-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10402427B2 (en) | System and method for analyzing result of clustering massive data | |
CN106528787B (en) | query method and device based on multidimensional analysis of mass data | |
CN106407207B (en) | Real-time newly-added data updating method and device | |
CN103150397B (en) | A kind of data directory creation method, data retrieval method and system | |
CN108205577B (en) | Array construction method, array query method, device and electronic equipment | |
CN106033416A (en) | A string processing method and device | |
US10685042B2 (en) | Identifying join relationships based on transactional access patterns | |
US20150227535A1 (en) | Caseless file lookup in a distributed file system | |
CN107402950A (en) | Divide the document handling method and device of table based on point storehouse | |
CN106126486A (en) | Temporal information coded method, encoded radio search method, coding/decoding method and device | |
US10838875B2 (en) | System and method for managing memory for large keys and values | |
CN107357794B (en) | Method and device for optimizing data storage structure of key value database | |
CN105488176A (en) | Data processing method and device | |
CN106599178A (en) | Big data processing method capable of realizing quick search and supporting distributed storage | |
CN108304404B (en) | Data frequency estimation method based on improved Sketch structure | |
CN110969000B (en) | Data merging processing method and device | |
US10095630B2 (en) | Sequential access to page metadata stored in a multi-level page table | |
CN110222046B (en) | List data processing method, device, server and storage medium | |
CN109189864B (en) | Method, device and equipment for determining data synchronization delay | |
CN103902693A (en) | Method of read-optimized memory database T-tree index structure | |
CN113849524B (en) | Data processing method and device | |
CN116361287A (en) | Path analysis method, device and system | |
US8533167B1 (en) | Compressed set representation for sets as measures in OLAP cubes | |
CN115185998A (en) | Target field searching method and device, server and computer readable storage medium | |
US11709993B2 (en) | Efficient concurrent invocation of sheet defined functions including dynamic arrays |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 523808 19th Floor, Cloud Computing Center, Chinese Academy of Sciences, No. 1 Kehui Road, Songshan Lake Hi-tech Industrial Development Zone, Dongguan City, Guangdong Province Applicant after: G-Cloud Technology Co., Ltd. Address before: 523808 Guangdong province Dongguan City Songshan Lake Science and Technology Industrial Park Building No. 14 Keyuan pine Applicant before: G-Cloud Technology Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |