CN104765871A

CN104765871A - Storage method for extracting big data from Internet

Info

Publication number: CN104765871A
Application number: CN201510200122.XA
Authority: CN
Inventors: 严澜
Original assignee: Chengdu Chuan Hang Information Technology Co Ltd
Current assignee: Chengdu Chuan Hang Information technology company limited; Suzhou Chong Xing Mdt InfoTech Ltd
Priority date: 2015-04-26
Filing date: 2015-04-26
Publication date: 2015-07-08

Abstract

The invention discloses a storage method for extracting big data from the Internet. The method includes the data request step of inputting a retrieval keyword at a client side, setting a retrieval range and obtaining data related to the retrieval keyword from the Internet within the retrieval range, the downloading step of downloading all the data related to the retrieval keyword at the client side and forwarding the data to a capacitance distributor, and the forwarding and storing step of dividing the data related to the retrieval keyword into a plurality of independent data units through the capacitance distributor, recording the capacitances of all the data units, sequentially storing the data units in a database set with M independent databases according to the sequence of a time axis, returning the residual capacitance information of the Nth database after the capacitance distributor stores the current data units in the Nth database, and starting to store the data units in the (N+1)th database through the capacitance distributor when the residual capacitance information of the Nth database is smaller than the capacitance of the next data unit.

Description

The storage means of large data is extracted from internet

Technical field

The present invention relates to large technical field of data processing, from internet, specifically extract the storage means of large data.

Background technology

For data processing enterprises, the particularly process of large data, need data to extract out thus the database of composition one class data, and the capacity of such database is very large, therefore, in the process of module data, the most important thing is to take a fancy to database volume parameter, and the general capacity of the database of routine is less, and the capacity of large-scale database is large, but construction cost is very high, for example, the construction cost of the database of general 2TB capacity reaches tens0000, if want the database of an assembly 20TB capacity, then need to reach millions of component costs, for general small business, this is a huge expense, therefore we need a kind of data base establishment method that can reduce costs, to ensure that the storage of these data will keep continuity simultaneously.

Summary of the invention

The object of the present invention is to provide a kind of storage means extracting large data from internet, jumbo database can be set up in the mode of low cost, and keep the continuity of database.

Object of the present invention is achieved through the following technical solutions: the storage means extracting large data from internet, comprises the following steps:

Request of data step: from client input search key, range of search is set, in range of search, obtains the data relevant to search key from internet;

Download step: all data relevant to search key of client downloads, and be transmitted to volume dispensers;

Forward storing step: the data relevant to search key are divided into several independently data cells by volume dispensers, and record the capacity of each data cell, simultaneously by data cell successively according to the sequential storage of time shaft in database collection, database collection comprises M independently database, M independently database comprise database 1, database 2 ..., database M; After current data unit is stored into N database by volume dispensers, N database returns the residual capacity information of N database, when the residual capacity information of N database is less than the capacity of next data cell, volume dispensers starts to N+1 data database storing unit, the like, until the data relevant to search key have all stored rear termination, N and M has been positive integer.

The design concept of said method is: the database collection in the present invention comprises M independently database, these independently database all adopt the database of low capacity, set up into the database that can hold large data according to above-mentioned storage means with the database of these low costs, low capacity, substitute the database of Conventional mass, and the cost of the erection cost ability several thousand yuan of above-mentioned independently database, by the database that said method sets up, in the process stored, still can keep the storage continuity of data.In order to advantage of the present invention is described, now illustrate: we want assembly one to be the database of " science fiction film " about range of search, and the quantity stating science fiction film is on the internet huge, therefore need to take a large amount of memory capacity, suppose that the individual data amount of 1 science fiction film is 2GB, suppose that the quantity of science fiction film is on the internet 10,000, the total volume of target database wants 20TB.According to the erection method of existing large database concept be, the database of 3 8TB is adopted to store these data respectively, and the database of 3 8TB is independently, between without any relevance, and be also discontinuous between them, the storage of its data is also mixed and disorderly, when we need to transfer any one data, then need the whole database that locks, therefore retrieval time is longer.And according to the database that method of the present invention is set up be, adopt the database of 20 1TB low capacities, each database cost is according to 3,000 yuan of calculating, then the cost of whole database is 60,000, and the cost of the database of an existing 8TB is all up to hundreds of thousands, because the database of 8TB needs higher computing to make and buffer memory condition, after 20 databases and volume dispensers set up by the present invention, science fiction film data on internet store according to the storage mode of time shaft by volume dispensers, and make key and this key is forwarded to client, we are when retrieving, first retrieve key, after finding corresponding key, the self contained data base that retrieval is corresponding with key again, finally recall the corresponding retrieval of content in database.

The capacity of each database is less than or equal to 1TB.

After volume dispensers storage completes data cell, key is made in the memory location of each data cell, and this key is forwarded to client.

All data cell set-up time axles store successively.

Volume dispensers is before storage data units, the data cell that screening capacity is greater than 2GB is kept in, data cell capacity being less than 2GB first stores, and after the data cell storage being less than 2GB until all capacity completes, again starts the data cell that memory capacity is greater than 2GB.

The invention has the advantages that: cost is low, it is good that data store continuity.

Accompanying drawing explanation

Fig. 1 is that data of the present invention store schematic diagram.

Embodiment

Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.

Embodiment 1:

As shown in Figure 1.

From internet, extract the storage means of large data, comprise the following steps:

The capacity of each database is less than or equal to 1TB.

All data cell set-up time axles store successively.

As mentioned above, then well the present invention can be realized.

Claims

1. from internet, extract the storage means of large data, it is characterized in that: comprise the following steps:

2. the storage means extracting large data from internet according to claim 1, is characterized in that: the capacity of each database is less than or equal to 1TB.

3. the storage means extracting large data from internet according to claim 1, is characterized in that: after volume dispensers storage completes data cell, key is made in the memory location of each data cell, and this key is forwarded to client.

4. the storage means extracting large data from internet according to claim 1, is characterized in that: all data cell set-up time axles store successively.

5. the storage means extracting large data from internet according to claim 1, it is characterized in that: volume dispensers is before storage data units, the data cell that screening capacity is greater than 2GB is kept in, data cell capacity being less than 2GB first stores, after the data cell storage being less than 2GB until all capacity completes, again start the data cell that memory capacity is greater than 2GB.