CN104765871A - Storage method for extracting big data from Internet - Google Patents

Storage method for extracting big data from Internet Download PDF

Info

Publication number
CN104765871A
CN104765871A CN201510200122.XA CN201510200122A CN104765871A CN 104765871 A CN104765871 A CN 104765871A CN 201510200122 A CN201510200122 A CN 201510200122A CN 104765871 A CN104765871 A CN 104765871A
Authority
CN
China
Prior art keywords
data
database
capacity
internet
data cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510200122.XA
Other languages
Chinese (zh)
Inventor
严澜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Chuan Hang Information technology company limited
Suzhou Chong Xing Mdt InfoTech Ltd
Original Assignee
Chengdu Chuan Hang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Chuan Hang Information Technology Co Ltd filed Critical Chengdu Chuan Hang Information Technology Co Ltd
Priority to CN201510200122.XA priority Critical patent/CN104765871A/en
Publication of CN104765871A publication Critical patent/CN104765871A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a storage method for extracting big data from the Internet. The method includes the data request step of inputting a retrieval keyword at a client side, setting a retrieval range and obtaining data related to the retrieval keyword from the Internet within the retrieval range, the downloading step of downloading all the data related to the retrieval keyword at the client side and forwarding the data to a capacitance distributor, and the forwarding and storing step of dividing the data related to the retrieval keyword into a plurality of independent data units through the capacitance distributor, recording the capacitances of all the data units, sequentially storing the data units in a database set with M independent databases according to the sequence of a time axis, returning the residual capacitance information of the Nth database after the capacitance distributor stores the current data units in the Nth database, and starting to store the data units in the (N+1)th database through the capacitance distributor when the residual capacitance information of the Nth database is smaller than the capacitance of the next data unit.

Description

The storage means of large data is extracted from internet
Technical field
The present invention relates to large technical field of data processing, from internet, specifically extract the storage means of large data.
Background technology
For data processing enterprises, the particularly process of large data, need data to extract out thus the database of composition one class data, and the capacity of such database is very large, therefore, in the process of module data, the most important thing is to take a fancy to database volume parameter, and the general capacity of the database of routine is less, and the capacity of large-scale database is large, but construction cost is very high, for example, the construction cost of the database of general 2TB capacity reaches tens0000, if want the database of an assembly 20TB capacity, then need to reach millions of component costs, for general small business, this is a huge expense, therefore we need a kind of data base establishment method that can reduce costs, to ensure that the storage of these data will keep continuity simultaneously.
Summary of the invention
The object of the present invention is to provide a kind of storage means extracting large data from internet, jumbo database can be set up in the mode of low cost, and keep the continuity of database.
Object of the present invention is achieved through the following technical solutions: the storage means extracting large data from internet, comprises the following steps:
Request of data step: from client input search key, range of search is set, in range of search, obtains the data relevant to search key from internet;
Download step: all data relevant to search key of client downloads, and be transmitted to volume dispensers;
Forward storing step: the data relevant to search key are divided into several independently data cells by volume dispensers, and record the capacity of each data cell, simultaneously by data cell successively according to the sequential storage of time shaft in database collection, database collection comprises M independently database, M independently database comprise database 1, database 2 ..., database M; After current data unit is stored into N database by volume dispensers, N database returns the residual capacity information of N database, when the residual capacity information of N database is less than the capacity of next data cell, volume dispensers starts to N+1 data database storing unit, the like, until the data relevant to search key have all stored rear termination, N and M has been positive integer.
The design concept of said method is: the database collection in the present invention comprises M independently database, these independently database all adopt the database of low capacity, set up into the database that can hold large data according to above-mentioned storage means with the database of these low costs, low capacity, substitute the database of Conventional mass, and the cost of the erection cost ability several thousand yuan of above-mentioned independently database, by the database that said method sets up, in the process stored, still can keep the storage continuity of data.In order to advantage of the present invention is described, now illustrate: we want assembly one to be the database of " science fiction film " about range of search, and the quantity stating science fiction film is on the internet huge, therefore need to take a large amount of memory capacity, suppose that the individual data amount of 1 science fiction film is 2GB, suppose that the quantity of science fiction film is on the internet 10,000, the total volume of target database wants 20TB.According to the erection method of existing large database concept be, the database of 3 8TB is adopted to store these data respectively, and the database of 3 8TB is independently, between without any relevance, and be also discontinuous between them, the storage of its data is also mixed and disorderly, when we need to transfer any one data, then need the whole database that locks, therefore retrieval time is longer.And according to the database that method of the present invention is set up be, adopt the database of 20 1TB low capacities, each database cost is according to 3,000 yuan of calculating, then the cost of whole database is 60,000, and the cost of the database of an existing 8TB is all up to hundreds of thousands, because the database of 8TB needs higher computing to make and buffer memory condition, after 20 databases and volume dispensers set up by the present invention, science fiction film data on internet store according to the storage mode of time shaft by volume dispensers, and make key and this key is forwarded to client, we are when retrieving, first retrieve key, after finding corresponding key, the self contained data base that retrieval is corresponding with key again, finally recall the corresponding retrieval of content in database.
The capacity of each database is less than or equal to 1TB.
After volume dispensers storage completes data cell, key is made in the memory location of each data cell, and this key is forwarded to client.
All data cell set-up time axles store successively.
Volume dispensers is before storage data units, the data cell that screening capacity is greater than 2GB is kept in, data cell capacity being less than 2GB first stores, and after the data cell storage being less than 2GB until all capacity completes, again starts the data cell that memory capacity is greater than 2GB.
The invention has the advantages that: cost is low, it is good that data store continuity.
Accompanying drawing explanation
Fig. 1 is that data of the present invention store schematic diagram.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
Embodiment 1:
As shown in Figure 1.
From internet, extract the storage means of large data, comprise the following steps:
Request of data step: from client input search key, range of search is set, in range of search, obtains the data relevant to search key from internet;
Download step: all data relevant to search key of client downloads, and be transmitted to volume dispensers;
Forward storing step: the data relevant to search key are divided into several independently data cells by volume dispensers, and record the capacity of each data cell, simultaneously by data cell successively according to the sequential storage of time shaft in database collection, database collection comprises M independently database, M independently database comprise database 1, database 2 ..., database M; After current data unit is stored into N database by volume dispensers, N database returns the residual capacity information of N database, when the residual capacity information of N database is less than the capacity of next data cell, volume dispensers starts to N+1 data database storing unit, the like, until the data relevant to search key have all stored rear termination, N and M has been positive integer.
The design concept of said method is: the database collection in the present invention comprises M independently database, these independently database all adopt the database of low capacity, set up into the database that can hold large data according to above-mentioned storage means with the database of these low costs, low capacity, substitute the database of Conventional mass, and the cost of the erection cost ability several thousand yuan of above-mentioned independently database, by the database that said method sets up, in the process stored, still can keep the storage continuity of data.In order to advantage of the present invention is described, now illustrate: we want assembly one to be the database of " science fiction film " about range of search, and the quantity stating science fiction film is on the internet huge, therefore need to take a large amount of memory capacity, suppose that the individual data amount of 1 science fiction film is 2GB, suppose that the quantity of science fiction film is on the internet 10,000, the total volume of target database wants 20TB.According to the erection method of existing large database concept be, the database of 3 8TB is adopted to store these data respectively, and the database of 3 8TB is independently, between without any relevance, and be also discontinuous between them, the storage of its data is also mixed and disorderly, when we need to transfer any one data, then need the whole database that locks, therefore retrieval time is longer.And according to the database that method of the present invention is set up be, adopt the database of 20 1TB low capacities, each database cost is according to 3,000 yuan of calculating, then the cost of whole database is 60,000, and the cost of the database of an existing 8TB is all up to hundreds of thousands, because the database of 8TB needs higher computing to make and buffer memory condition, after 20 databases and volume dispensers set up by the present invention, science fiction film data on internet store according to the storage mode of time shaft by volume dispensers, and make key and this key is forwarded to client, we are when retrieving, first retrieve key, after finding corresponding key, the self contained data base that retrieval is corresponding with key again, finally recall the corresponding retrieval of content in database.
The capacity of each database is less than or equal to 1TB.
After volume dispensers storage completes data cell, key is made in the memory location of each data cell, and this key is forwarded to client.
All data cell set-up time axles store successively.
Volume dispensers is before storage data units, the data cell that screening capacity is greater than 2GB is kept in, data cell capacity being less than 2GB first stores, and after the data cell storage being less than 2GB until all capacity completes, again starts the data cell that memory capacity is greater than 2GB.
As mentioned above, then well the present invention can be realized.

Claims (5)

1. from internet, extract the storage means of large data, it is characterized in that: comprise the following steps:
Request of data step: from client input search key, range of search is set, in range of search, obtains the data relevant to search key from internet;
Download step: all data relevant to search key of client downloads, and be transmitted to volume dispensers;
Forward storing step: the data relevant to search key are divided into several independently data cells by volume dispensers, and record the capacity of each data cell, simultaneously by data cell successively according to the sequential storage of time shaft in database collection, database collection comprises M independently database, M independently database comprise database 1, database 2 ..., database M; After current data unit is stored into N database by volume dispensers, N database returns the residual capacity information of N database, when the residual capacity information of N database is less than the capacity of next data cell, volume dispensers starts to N+1 data database storing unit, the like, until the data relevant to search key have all stored rear termination, N and M has been positive integer.
2. the storage means extracting large data from internet according to claim 1, is characterized in that: the capacity of each database is less than or equal to 1TB.
3. the storage means extracting large data from internet according to claim 1, is characterized in that: after volume dispensers storage completes data cell, key is made in the memory location of each data cell, and this key is forwarded to client.
4. the storage means extracting large data from internet according to claim 1, is characterized in that: all data cell set-up time axles store successively.
5. the storage means extracting large data from internet according to claim 1, it is characterized in that: volume dispensers is before storage data units, the data cell that screening capacity is greater than 2GB is kept in, data cell capacity being less than 2GB first stores, after the data cell storage being less than 2GB until all capacity completes, again start the data cell that memory capacity is greater than 2GB.
CN201510200122.XA 2015-04-26 2015-04-26 Storage method for extracting big data from Internet Pending CN104765871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510200122.XA CN104765871A (en) 2015-04-26 2015-04-26 Storage method for extracting big data from Internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510200122.XA CN104765871A (en) 2015-04-26 2015-04-26 Storage method for extracting big data from Internet

Publications (1)

Publication Number Publication Date
CN104765871A true CN104765871A (en) 2015-07-08

Family

ID=53647699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510200122.XA Pending CN104765871A (en) 2015-04-26 2015-04-26 Storage method for extracting big data from Internet

Country Status (1)

Country Link
CN (1) CN104765871A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101184104A (en) * 2007-12-21 2008-05-21 腾讯科技(深圳)有限公司 Distributed memory system and method
CN102158540A (en) * 2011-02-18 2011-08-17 广州从兴电子开发有限公司 System and method for realizing distributed database
CN103152395A (en) * 2013-02-05 2013-06-12 北京奇虎科技有限公司 Storage method and device of distributed file system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101184104A (en) * 2007-12-21 2008-05-21 腾讯科技(深圳)有限公司 Distributed memory system and method
CN102158540A (en) * 2011-02-18 2011-08-17 广州从兴电子开发有限公司 System and method for realizing distributed database
CN103152395A (en) * 2013-02-05 2013-06-12 北京奇虎科技有限公司 Storage method and device of distributed file system

Similar Documents

Publication Publication Date Title
CN104252536B (en) A kind of internet log data query method and device based on hbase
US9805079B2 (en) Executing constant time relational queries against structured and semi-structured data
CN110413611B (en) Data storage and query method and device
US10558495B2 (en) Variable sized database dictionary block encoding
TWI613555B (en) Search method and device
CN102024047B (en) Data searching method and device thereof
CN103678491A (en) Method based on Hadoop small file optimization and reverse index establishment
CN102332030A (en) Data storing, managing and inquiring method and system for distributed key-value storage system
CN107357843B (en) Massive network data searching method based on data stream structure
CN111046034A (en) Method and system for managing memory data and maintaining data in memory
US20100274795A1 (en) Method and system for implementing a composite database
CN102375853A (en) Distributed database system, method for building index therein and query method
CN100458784C (en) Researching system and method used in digital labrary
CN102024019B (en) Suffix tree based catalog organizing method in distributed file system
CN106599091B (en) RDF graph structure storage and index method based on key value storage
US9262511B2 (en) System and method for indexing streams containing unstructured text data
CN104778182B (en) Data lead-in method and system based on HBase
CN103714096A (en) Lucene-based inverted index system construction method and device, and Lucene-based inverted index system data processing method and device
CN102968456B (en) A kind of raster data reading and processing method and device
CN103353901A (en) Orderly table data management method and system based on Hadoop distributed file system (HDFS)
CN101833511B (en) Data management method, device and system
CN108319634B (en) Directory access method and device for distributed file system
CN104268158A (en) Structural data distributed index and retrieval method
CN104881475A (en) Method and system for randomly sampling big data
CN110633261A (en) Picture storage method, picture query method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20170621

Address after: 610000 Chengdu high tech Zone, Sichuan Tianyi street, No. 3, building 38

Applicant after: Chengdu Chuan Hang Information technology company limited

Applicant after: Suzhou Chong Xing Mdt InfoTech Ltd

Address before: 610000 Chengdu high tech Zone, Sichuan Tianyi street, No. 3, building 38

Applicant before: Chengdu Chuan Hang Information technology company limited

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20150708

RJ01 Rejection of invention patent application after publication