CN106599178A

CN106599178A - Big data processing method capable of realizing quick search and supporting distributed storage

Info

Publication number: CN106599178A
Application number: CN201611142025.0A
Authority: CN
Inventors: 郑锐韬; 李勇波; 张恒; 孙傲冰; 季统凯
Original assignee: G Cloud Technology Co Ltd
Current assignee: G Cloud Technology Co Ltd
Priority date: 2016-12-12
Filing date: 2016-12-12
Publication date: 2017-04-26
Anticipated expiration: 2036-12-12
Also published as: CN106599178B

Abstract

The invention relates to the technical field of big data storage, in particular to a big data processing method capable of realizing quick search and supporting distributed storage. According to the method, characteristics in a process are accurately searched for by analyzing big data volume; accurate locating during big data accurate search is supported by performing MD5 and consistency hash calculation on accurate data and adding MD5 fields and hash fields, so that data with small correlation is filtered out; the data is searched for in a relatively small space, so that the efficiency of big data volume accurate search is improved; and multi-file or multi-server distributed storage can be performed according to different hash values through storage definition of the hash fields, so that the storage space utilization of the big data volume is increased, the data storage balanced loading is realized, and the pressure of a storage server is reduced. Through the method, in a specific scene needed to be subjected to data accurate acquisition, the storage efficiency of the big data volume can be improved, and a quick and accurate acquisition method can be provided, so that the big data search efficiency is greatly improved.

Description

A kind of achievable fast searching simultaneously is supported to be distributed the big data processing method of storage

Technical field

The present invention relates to big data technical field of memory, especially a kind of achievable fast searching and support be distributed storage Big data processing method.

Background technology

With the development of computer ecommerce, the data that application program is produced are more and more, and the data volume of application, Concurrency is also increasing, for example, carry out situations such as accurate commodity searching, mobile phone location positioning, inspection of network connection, Unit interval domestic demand rapidly obtains the information of the data specified in substantial amounts of data.For general big data storage method, Quickly found in substantial amounts of data and navigated in specific data, big data need to be traveled through, be exactly phase The index of pass is guided, but safeguards that massive index is also a hard work when data increase, change, deleting, can be very big Affect the storage of data and the efficiency for reading, it is impossible to meet the request of big data quantity, high concurrent well, cause application program to exist Bottleneck in operation.

The content of the invention

Present invention solves the technical problem that being to provide a kind of achievable fast searching and support to be distributed the big data of storage Processing method；Find and support that carrying out distribution deposits for fast and accurately data are carried out on the memory space of big data quantity Storage.

The present invention solves the technical scheme of above-mentioned technical problem：

Described method includes following step：

Step 1：Data one by one to being stored are carried out the extraction of feature by certain algorithm, and acquisition can determine specific The unique features of data are used for the calculating of follow-up data value, and form the method that can quickly carry out data characteristicses extraction, are used for Use when data storage and reading；

Step 2：From the feature that data one by one are extracted, the calculating of MD5 values is carried out, draw MD5 values, then calculated by Hash Method, calculates the cryptographic Hash from 1 to N, and the size of N carries out value by the distributed storage of specific data volume size and division；

Step 3：The storage organization of design data, except the space for having data storage, the also space of MD5 values and cryptographic Hash Space, cryptographic Hash has the data of identical cryptographic Hash for directly hitting, and MD5 values are accurate for determining in identical cryptographic Hash Data；

Step 4：The feature of data when reading data, is extracted, and calculates MD5 values and cryptographic Hash, filtered by cryptographic Hash Fall most data, and accurate data value is determined by MD5 from the data value of small range.

To the eigenvalue for extracting, the calculating of MD5 is carried out, after the MD5 to eigenvalue is calculated, to MD5 Hash meters Calculate, draw cryptographic Hash, so that substantial amounts of data carry out distributed storage by the cryptographic Hash for calculating；

When storage is with reading, MD5 values and cryptographic Hash are calculated according to unified method.

Select can technology carry out the middleware of subregion or distributed structure/architecture as memory space；When memory space is set up, Partitioned file or distributed server architecture are set up by cryptographic Hash, so as to ensure that big data storage and reading process separate Reading, equally loaded；

When data are on the memory space of storage to design, data, MD5 values, cryptographic Hash are preserved together, storage is empty Between by design storage logic store the data to specific storage file or storage server.

Described sets up partitioned file or distributed server architecture by cryptographic Hash, and the process of foundation adopts conforming Kazakhstan Uncommon algorithm.

In digital independent, by the cryptographic Hash calculated, subregion or distributed server storage are being carried out Spatially, it is determined that file or server on identical cryptographic Hash is read out；

The data of identical cryptographic Hash are read out, then is contrasted by MD5 values, obtain out identical MD5 value, so as to quick Search out the data of needs.

The invention has the beneficial effects as follows：

Method by analyze big data quantity accurately found during the characteristics of, by carrying out MD5 to accurate data And conforming Hash calculation, and by increasing MD5 fields and Hash field come accurately fixed when supporting that big data is accurately found Position, the data little so as to filter out dependency, the searching data in the relatively small space are accurately searched so as to improve big data quantity Efficiency；Simultaneously defined by the storage to Hash field, can by different cryptographic Hash carry out multifile or multiserver point Cloth is stored, and so as to the memory space for improving big data quantity is utilized, is accomplished data storage equally loaded, is reduced storage server Pressure.

Description of the drawings

The present invention is further described below in conjunction with the accompanying drawings：

Accompanying drawing 1 is the flow chart of computer software functional unit of the present invention.

Specific embodiment

As shown in figure 1, method of the present invention implementation steps are as follows：

Step 1：On the Storage Middleware Applying of data, the memory space of setting data, MD5 memory spaces, cryptographic Hash storage Space, and the table subregion or distributed server design Storage of memory space are carried out by cryptographic Hash, by the side of concordance Hash Method carries out design Storage；

Step 2：Specific data characteristicses extracting method is defined, data to be increased are carried out carrying for feature by method one by one Take；

Step 3：From the feature that data one by one are extracted, the calculating of MD5 values is carried out, draw MD5 values, then calculated by Hash Method, calculates the cryptographic Hash from 1 to N；

Step 4：Data, MD5 values, cryptographic Hash are saved on memory space, Storage Middleware Applying is automatically by the scope of design The single cent part or sub-server that data are carried out by cryptographic Hash is preserved；

Step 5：When reading data, data to be read are carried out with feature extraction by method first, and is calculated MD5 values and is breathed out Uncommon value, reads the data of identical cryptographic Hash from Storage Middleware Applying by cryptographic Hash, and Storage Middleware Applying navigates to data by cryptographic Hash The file or server of storage, so as to read the data of peek amount very little, and compares identical MD5 data, and returns what is specified Data message.

It is described to design concretely comprising the following steps for Storage Middleware Applying concordance Hash table：

Step one, the available Storage Middleware Applying of selection, using middlewares such as conventional Mysql or MongoDB；

Step 2, in storage between design memory space on part, and be designed with the space of data, MD5, cryptographic Hash, be used for The storage of data；

Step 3, by the scope of cryptographic Hash, partition holding of the design data by cryptographic Hash, such as by the data per 1,000,000 Amount can so design a data space in a balanced way as a memory space.

The feature of the extracted data is concretely comprised the following steps：

The clear and definite feature of step one, data inherently, then can be directly as data characteristicses, such as network address；

Step 2, data generation time can be as data characteristicses, then using the time as data characteristicses；

Step 3, the equipment of data as data characteristicses, then using the unique mark of equipment as data characteristicses, such as mobile phone Number etc.；

Step 4, cannot be used as data characteristicses for unique mark, can be by assemblage characteristic as mark, such as equipment + the time.

For the key point of the fast searching method of the particular data based on big data, can be extracted from data one by one Go out clear and definite feature, a data can propose multiple features, the feature for proposing out need to be unique, by can be from the method Positioning searching is rapidly carried out, the data of needs are quickly found out.

The logic of subregion or distributed server is set up by cryptographic Hash by using specific data storage middleware, is passed through This mode is stored come the classification for carrying out data, reduces the load to big file or server, so as to improve the storage of big data With the efficiency for reading.

Claims

1. a kind of achievable fast searching and support be distributed storage big data processing method, it is characterised in that：Described method Including following step：

Step 1：Data one by one to being stored are carried out the extraction of feature by certain algorithm, and acquisition can determine particular data Unique features be used for follow-up data value calculating, and formed can quickly carry out data characteristicses extraction method, for data Use when storage and reading；

Step 2：From the feature that data one by one are extracted, the calculating of MD5 values is carried out, MD5 values are drawn, then by hash algorithm, The cryptographic Hash from 1 to N is calculated, the size of N carries out value by the distributed storage of specific data volume size and division；

Step 3：The storage organization of design data, except the space for having data storage, the also sky of the space of MD5 values and cryptographic Hash Between, cryptographic Hash has the data of identical cryptographic Hash for directly hitting, and MD5 values accurately count for determining in identical cryptographic Hash According to；

Step 4：The feature of data when reading data, is extracted, and calculates MD5 values and cryptographic Hash, filtered out greatly by cryptographic Hash Partial data, and accurate data value is determined by MD5 from the data value of small range.

2. method according to claim 1, it is characterised in that：

To the eigenvalue for extracting, the calculating of MD5 is carried out, after the MD5 to eigenvalue is calculated, to MD5 Hash calculations, obtained Go out cryptographic Hash, so that substantial amounts of data carry out distributed storage by the cryptographic Hash for calculating；

3. method according to claim 1, it is characterised in that：

Select can technology carry out the middleware of subregion or distributed structure/architecture as memory space；When memory space is set up, by Kazakhstan Uncommon value sets up partitioned file or distributed server architecture, so as to ensure big data storage and the separate reading of reading process Take, equally loaded；

When data are on the memory space of storage to design, data, MD5 values, cryptographic Hash are preserved together, memory space is pressed The storage logic of design stores the data to specific storage file or storage server.

4. method according to claim 2, it is characterised in that：

5. the method according to claim 3 or 4, it is characterised in that：Described sets up partitioned file or distribution by cryptographic Hash Formula server architecture, the process of foundation adopt conforming hash algorithm.

6. the method according to any one of Claims 1-4, it is characterised in that：

In digital independent, by the cryptographic Hash calculated, in the space for having carried out subregion or distributed server storage On, it is determined that file or server on identical cryptographic Hash is read out；

The data of identical cryptographic Hash are read out, then is contrasted by MD5 values, obtain out identical MD5 value, so as to fast searching To the data for needing.

7. method according to claim 5, it is characterised in that：