CN106649847A - A large data real-time processing system based on Hadoop - Google Patents

A large data real-time processing system based on Hadoop Download PDF

Info

Publication number
CN106649847A
CN106649847A CN201611255956.1A CN201611255956A CN106649847A CN 106649847 A CN106649847 A CN 106649847A CN 201611255956 A CN201611255956 A CN 201611255956A CN 106649847 A CN106649847 A CN 106649847A
Authority
CN
China
Prior art keywords
data
query
real
time
hadoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611255956.1A
Other languages
Chinese (zh)
Inventor
陈嵩荣
郑志伟
张木辉
蔡剑齐
王晓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linewell Software Co Ltd
Original Assignee
Linewell Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Linewell Software Co Ltd filed Critical Linewell Software Co Ltd
Priority to CN201611255956.1A priority Critical patent/CN106649847A/en
Publication of CN106649847A publication Critical patent/CN106649847A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a large data real-time processing system based on Hadoop. In the embodiment of the large data real-time processing system based on Hadoop, filtration and index can be carried out while the query task is created, and the filtered index files is distributed to the datanode while filtration and index is carried out; the query of the local file is achieved by datanode at the same time, and the query result is returned to the client. And when any datanode query in the embodiment of the large data real-time processing system based on Hadoop is achieved, the query result can be returned quickly to the client through the periodic polling mechanism of the real-time transport middleware. In the embodiment of the large data real-time processing system based on Hadoop, the data query processing process in HDFS is concurrently executed, which makes maximum use of the hardware device of a computer, enables query to be achieved in real-time, and greatly improves the efficiency of query. When users execute query operations, the query results can be obtained, which improves the efficiency of data query and enables the query requests of clients to be quickly responded.

Description

A kind of big data real time processing system based on Hadoop
Technical field
The present invention relates to field of computer technology, and in particular to a kind of big data real time processing system based on Hadoop.
Background technology
With informationalized development, enterprise's data to be processed are in explosive growth, and data volume has all reached super large rule Mould (such as from TB levels to PB levels), thus brings a series of problem.Data volume increases, and the load of system is increasing, The warehouse-in and query performance of data declines therewith.In the case where hardware cost is not increased, the maximum performance of system how is played, The fastest of warehouse-in and inquiry is made, is the difficult problem that many enterprises face.
Cloud computing appear as mass data processing provide efficiently solve approach, in common cloud computing solution Middle to exist based on the Frame Design of Hadoop, Hadoop includes:Distributed file system (Hadoop Distributed File System, HDFS) and MapReduce.HDFS provides storage for the data of magnanimity, and MapReduce is the data of magnanimity There is provided calculating.Mass data storage can easily be realized by the HDFS of Hadoop, while effectively preventing Single Point of Faliure, kept away Exempt from unnecessary loss.But, when carrying out data retrieval on HDFS, conventional method is to open global search MapReduce, The concurrent operation for carrying out large-scale data is needed, this needs all data stored on time HDFS of full filter.In cloud meter In calculation, especially in the case of mass data, carrying out global search using MapReduce on HDFS in prior art can be to being System resource causes huge waste, takes a substantial amount of time.
The content of the invention
It is an object of the invention to provide a kind of big data real time processing system based on Hadoop, looks into for improving data The efficiency of inquiry, the inquiry request of quick response client.
In order to achieve the above object, the present invention is using such following technical scheme:
The present invention provides a kind of big data real time processing system based on Hadoop, the big data reality based on Hadoop When processing system include:Client, real-time Transmission middleware, distributed file system HDFS, wherein,
The HDFS includes:Control node namenode and multiple back end datanode;
The control node, for starting multithreading on the plurality of back end, create needs warehouse-in in real time Multiple data distinguish corresponding index, and multiple indexes are stored in multiple index files according to creation time;
The client, for sending data acquisition get requests to the HDFS by the real-time Transmission middleware;
The real-time Transmission middleware, for the data acquisition request that the client sends to be transmitted into the control section Point;
The control node, the data acquisition request for being sent according to the client creates query task, described to look into Inquiry task includes:The querying condition that target data is met, the querying condition includes:Query time condition;Looked into according to described Query time condition and the plurality of index file in inquiry condition is matched, and is filtered out and is met the query time condition Index condition;The query task is distributed on the plurality of back end, according to the index file for filtering out and institute State querying condition and inquire about the plurality of back end, so as to be met the position of the data of the querying condition;Again to institute State multiple back end and distribute the query task, according to the position of the data for meeting the querying condition the plurality of Data are read on back end, when any one back end successful inquiring in the plurality of back end, inquiry knot is returned Really;
The real-time Transmission middleware, for according to preset polling cycle poll inquiry result list, if described look into It is not sky to ask result list, then read the Query Result file in the Query Result catalogue and return to client;
The client, for getting the Query Result file in real time by the real-time Transmission middleware.
After above-mentioned technical proposal, the technical scheme that the present invention is provided will have the following advantages:
In big data real time processing system based on Hadoop provided in an embodiment of the present invention, it is possible to achieve to big data reality When process, the warehouse-in of data can be realized in big data real time processing system, inquire about, transmission is all concurrent, and is real-time 's.In the embodiment of the present invention while query task is created, filtration index is carried out, will can have been filtered while filtering index Index file be distributed to above datanode, while datanode completes the inquiry of local file, and return to client and look into Ask result.And the inquiry of any datanode is completed in the embodiment of the present invention, the week of real-time Transmission middleware can be passed through Phase polling mechanism quickly returns Query Result to client.In the embodiment of the present invention, the data query in HDFS was processed Journey is all concurrently performed, and the hardware device of computer is make use of to greatest extent, has been reached inquiry and is completed in real time, greatly The efficiency of inquiry is improve, user just can obtain Query Result when performing inquiry operation, improve the efficiency of data query, it is quick to ring Answer the inquiry request of client.
Description of the drawings
Fig. 1 provides a kind of composition structural representation of the big data real time processing system based on Hadoop for the embodiment of the present invention Figure;
Fig. 2 is provided based on the querying flow schematic diagram of jetty for the embodiment of the present invention.
Specific embodiment
A kind of big data real time processing system based on Hadoop is embodiments provided, for improving data query Efficiency, the inquiry request of quick response client.
To enable goal of the invention, feature, the advantage of the present invention more obvious and understandable, below in conjunction with the present invention Accompanying drawing in embodiment, is clearly and completely described, it is clear that disclosed below to the technical scheme in the embodiment of the present invention Embodiment be only a part of embodiment of the invention, and not all embodiments.Based on the embodiment in the present invention, this area The every other embodiment that technical staff is obtained, belongs to the scope of protection of the invention.
Term " comprising " and " having " in description and claims of this specification and above-mentioned accompanying drawing and they Any deformation, it is intended that cover it is non-exclusive includes, so as to include a series of units process, method, system, product or set It is standby to be not necessarily limited to those units, but may include clearly not list or for these processes, method, product or equipment are solid Other units having.
It is described in detail individually below.
The one embodiment of the present invention based on the big data real time processing system of Hadoop, it is possible to achieve in distributed system The quick real-time query of data is completed in architecture.The embodiment of the present invention can overcome cloud computing solution party of the prior art Frequently-used data processing method can cause system resource waste, the shortcoming of data processing time length in case, there is provided a kind of effective sea Amount Real-time Data Processing Method.The warehouse-in of data in the embodiment of the present invention, inquiry, transmission is all concurrent, real-time.Refer to Shown in Fig. 1, the big data real time processing system based on Hadoop that the present invention is provided, including:In the middle of client, real-time Transmission Part, distributed file system (Hadoop Distributed File System, HDFS), wherein,
HDFS includes:Control node (namenode) and multiple back end (datanode);
Control node, for starting multithreading on multiple back end, creates in real time the multiple data for needing warehouse-in The corresponding index of difference, and multiple indexes are stored in multiple index files according to creation time;
Client, for sending data acquisition get requests to HDFS by real-time Transmission middleware;
Real-time Transmission middleware, the data acquisition request for client to be sent is transmitted to control node;
Control node, the data acquisition request for being sent according to client creates query task, and query task includes:Mesh The querying condition that mark data are met, querying condition includes:Query time condition;Query time condition in querying condition Matched with multiple index files, filtered out the index condition for meeting query time condition;Query task is distributed to multiple On back end, multiple back end are inquired about according to the index file and querying condition that filter out, so as to be met inquiry bar The position of the data of part;Distribute query task to multiple back end again, existed according to the position of the data for meeting querying condition Data are read on multiple back end, when any one back end successful inquiring in multiple back end, inquiry knot is returned Really;
Real-time Transmission middleware, for according to preset polling cycle poll inquiry result list, if Query Result mesh Record is not sky, then read the Query Result file in Query Result catalogue and return to client;
Client, for getting Query Result file in real time by real-time Transmission middleware.
In the embodiment of the present invention, HDFS is realized based on Hadoop, the characteristics of HDFS has high fault tolerance, HDFS can provide height Handling capacity (high throughput) carrys out the data of access application, and being adapted to those has super large data set (large data Set application program), HDFS relaxes the requirement of POSIX, can in the form of streaming access (streaming access) file Data in system.
Wherein, real-time Transmission middleware is arranged between client and HDFS, and the interaction between client and HDFS passes through Real-time Transmission middleware completing, the forwarding of such as inquiry request and the forwarding of Query Result etc..In some enforcements of the present invention In example, real-time Transmission middleware, specially:Using jetty as network (web) container.Jetty is one and increases income Used as web container, it is that, based on the web container of Java, such as JSP and servlet provides running environment to servlet containers, Servlet (server applet), full name Java Servlet, with the server of written in Java, its major function exists In interactively browsing and changing data, dynamic web content is generated.
In embodiments of the present invention, control node and multiple back end are included in HDFS, back end is that data are deposited Storage unit, the needs that physical layer interface is transmitted can be stored on back end, when client needs to read data from HDFS When, read on back end that can be from HDFS.Control node in the embodiment of the present invention can be divided into main control node With standby control node, so as to ensure that HDFS can timely respond to the request of client by active-standby mode.In the embodiment of the present invention, It is independent that multiple back end in HDFS perform query task, and each back end query feedback is also independent realization, only There is any back end inquiry to complete, can be by real-time Transmission middleware to client feedback Query Result, it is not necessary to The inquiry of all back end finish after unified feedback Query Result again, therefore with very high efficiency data query.
In big data real time processing system based on Hadoop provided in an embodiment of the present invention, it is possible to achieve to big data reality When process, the warehouse-in of data can be realized in big data real time processing system, inquire about, transmission is all concurrent, and is real-time 's.In the embodiment of the present invention while query task is created, filtration index is carried out, will can have been filtered while filtering index Index file be distributed to above datanode, while datanode completes the inquiry of local file, and return to client and look into Ask result.And the inquiry of any datanode is completed in the embodiment of the present invention, the week of real-time Transmission middleware can be passed through Phase polling mechanism quickly returns Query Result to client.In the embodiment of the present invention, the data query in HDFS was processed Journey is all concurrently performed, and the hardware device of computer is make use of to greatest extent, has been reached inquiry and is completed in real time, greatly The efficiency of inquiry is improve, user just can obtain Query Result when performing inquiry operation, improve the efficiency of data query, it is quick to ring Answer the inquiry request of client.
In some embodiments of the invention, control node, specifically for the structure according to B+ trees needs are created in real time Multiple data of warehouse-in distinguish corresponding index;And,
Control node, specifically for passing through the multiple data sections of B+ tree queries according to the index file and querying condition that filter out Point, so as to be met the position of the data of querying condition.
Wherein, the big data real time processing system in the embodiment of the present invention based on Hadoop can realize entering in real time for data Storehouse.Based on existing HDFS, start multithreading on every datanode and create index, index file is created parallel.It is right Some significant fields set up index, and with the structural generation of B+ trees, each new record only needs to be inserted into B+ trees.B+ trees Insertion is carried out only on leaf node.Often insert the subtree number that will be judged after (key-pointer) index entry in node Whether go beyond the scope.When the subtree number in inserting postjunction is more than m (exponent number of B+ trees), need for leaf node to be split into two Individual node.The maximum key and node address of the two nodes should be simultaneously included in their parents' node.Hereafter, problem is returned In inserting in non-leaf node.The insertion of key and the insertion of leaf node are similar in non-leaf node, in non-leaf node The upper limit of subtree number be m, the super node split that also to carry out that goes beyond the scope.When root knot dot splitting is done, because without double Close node, must just create new parents' node, used as the new root of tree.
In some embodiments of the invention, client, is additionally operable to continue to send data acquisition continuation request to HDFS;
HDFS, is additionally operable to be asked according to continuation is obtained by control node response, and from multiple back end multithreading is started Obtain data acquisition to continue to ask corresponding Query Result;
Real-time Transmission middleware, is additionally operable to read data acquisition according to preset polling cycle and continues to ask corresponding to look into Result is ask, and returns to client.
In the embodiment of the present invention, real-time Transmission middleware can use jetty as web container, and data are done on HDFS While inquiry, jetty repeating query Query Result catalogues, if being not sky, read Query Result file and return to client. Client continues to send data acquisition continuation (continue) request to HDFS ends, and control node starts multithreading reading inquiry and ties Really, the Query Result for reading is returned to by client by jetty, if the reading data for returning are sky, flow process terminates, such as Fruit is not sky, and client continues to send continue requests.In query script, any datanode successful inquirings, i.e., to client End returned data, it is not necessary to which all datanode inquiries are completed.
As shown in Fig. 2 for the querying flow figure of jetty in inventive embodiments, using jetty as web container, first visitor Family end to HDFS ends send get requests, and control node end parsing json goes here and there, and json is a kind of data interchange format of lightweight, Querying condition instantiation job objects in json strings, submit to job to carry out distributed query, finally return to result.In HDFS On while inquire about, jetty repeating query Query Result catalogues, if being not sky, read file and simultaneously return to client, client End continues to send continue requests to HDFS ends, and control node end starts multithreading and reads Query Result, will read data and returns Back to client.If Query Result is the inquiry on empty and HDFS be over, sky is returned, flow process terminates.Wherein Any one step, returns if failure is produced, and for example extremely, request error, index file folder is situations such as exist.
In some embodiments of the invention, client, is additionally operable to real-time Transmission middleware and sends ending request;
Real-time Transmission middleware, after being additionally operable to the ending request for receiving client transmission, stops poll inquiry result Catalogue.
Wherein, if client sends ending request to real-time Transmission middleware, real-time Transmission middleware is no longer to client End returns Query Result, so as to realize that client is timely responded to, reduces the occupancy to transfer resource, improves resource and uses effect Rate.
In some embodiments of the invention, control node, the position of the data specifically for being met querying condition Afterwards, according to the position of the data for meeting querying condition determine target data place back end internet protocol address and Side-play amount, according to IP the back end of storage target data is found, further according to side-play amount from the back end for storing target data In find target data.
Big data real time processing system in the embodiment of the present invention based on Hadoop can realize real-time query:Using distribution Formula computing system, creates at control node end and submits to query task (job) to be inquired about, and inquiry is divided into following process:First Filtration is indexed in control node, because the title of index file was created according to the time, according in querying condition The title of query time condition and index file is matched, and screening meets the index file of condition.Query task is distributed to On every datanode, according to the index file and querying condition for filtering out by B+ tree queries, the data of condition are met Position, the distribution of task is carried out again, data are read on every machine according to the position of data obtained in the previous step, and return Return Query Result.Efficiently B+ structures and the executed in parallel of inquiry, have reached inquiry and complete in real time.Wherein, the position note of data The IP address and side-play amount at data storage place machine (datanode) are recorded, machine has been found according to IP address, further according to skew Amount can just find corresponding data.
In some embodiments of the invention, query time condition, including:Inquiry time started and poll-final time, Wherein, there must be query time condition in querying condition, you can to obtain inquiry time started and the inquiry knot of client setting The beam time, such that it is able to perform query task according to the data acquisition request of client, start according to the inquiry time started Query task, according to the poll-final time query task is terminated.
All process in the embodiment of the present invention are all concurrently performed, and the hardware that computer is make use of to greatest extent sets It is standby, drastically increase treatment effeciency.Make user just can obtain Query Result when performing inquiry operation.The present invention includes data Real-time warehouse-in, real-time query, real-time results transmission, the warehouse-in of data, inquiry, transmission is all concurrent, real-time.It is of the invention real During the big data real time processing system of example offer is provided, the warehouse-in of data, inquiry, it is all concurrent to transmit, in real time.Appoint creating While business, filtration index is carried out, while filtering index, the index file for having filtered is distributed to above datanode, together When datanode complete the inquiry of local file, and to client returned data.The inquiry of any datanode is completed, i.e., to Family returns Query Result.The inventive method processing procedure is all concurrently performed, and the hardware of computer is make use of to greatest extent The executed in parallel of equipment, efficient B+ structures and inquiry, has reached inquiry and completes in real time, drastically increases the efficiency of inquiry, User just can obtain Query Result when performing inquiry operation.
In addition it should be noted that, device embodiment described above is only schematic, wherein described as separating The unit of part description can be or may not be it is physically separate, can be as the part that unit shows or Can not be physical location, you can be located at a place, or can also be distributed on multiple NEs.Can be according to reality The purpose for needing to select some or all of module therein to realize this embodiment scheme on border.In addition, what the present invention was provided In device embodiment accompanying drawing, the annexation between module is represented and have between them communication connection, specifically can be implemented as one Bar or a plurality of communication bus or holding wire.Those of ordinary skill in the art are not in the case where creative work is paid, you can with Understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can be borrowed Software is helped to add the mode of required common hardware to realize, naturally it is also possible to include special IC, specially by specialized hardware Realized with CPU, private memory, special components and parts etc..Generally, all functions of being completed by computer program can Easily realized with corresponding hardware, and, for realizing that the particular hardware structure of same function can also be various many Sample, such as analog circuit, digital circuit or special circuit etc..But, it is more for the purpose of the present invention in the case of software program reality It is now more preferably embodiment.Based on such understanding, technical scheme is substantially made to prior art in other words The part of contribution can be embodied in the form of software product, and the computer software product is stored in the storage medium that can read In, such as floppy disk, USB flash disk, portable hard drive, read-only storage (ROM, Read-Only Memory), the random access memory of computer Device (RAM, Random Access Memory), magnetic disc or CD etc., including some instructions use is so that a computer sets Standby (can be personal computer, server, or network equipment etc.) performs the method described in each embodiment of the invention.
In sum, above example is only to illustrate technical scheme, rather than a limitation;Although with reference to upper State embodiment to be described in detail the present invention, it will be understood by those within the art that:It still can be to upper State the technical scheme described in each embodiment to modify, or equivalent is carried out to which part technical characteristic;And these Modification is replaced, and does not make the spirit and scope of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution.

Claims (7)

1. a kind of big data real time processing system based on Hadoop, it is characterised in that the big data reality based on Hadoop When processing system include:Client, real-time Transmission middleware, distributed file system HDFS, wherein,
The HDFS includes:Control node namenode and multiple back end datanode;
The control node, for starting multithreading on the plurality of back end, create needs the multiple of warehouse-in in real time Data distinguish corresponding index, and multiple indexes are stored in multiple index files according to creation time;
The client, for sending data acquisition get requests to the HDFS by the real-time Transmission middleware;
The real-time Transmission middleware, for the data acquisition request that the client sends to be transmitted into the control node;
The control node, the data acquisition request for being sent according to the client creates query task, and the inquiry is appointed Business includes:The querying condition that target data is met, the querying condition includes:Query time condition;According to the inquiry bar Query time condition and the plurality of index file in part is matched, and filters out the index for meeting the query time condition Condition;The query task is distributed on the plurality of back end, according to the index file for filtering out and described is looked into The plurality of back end of condition query is ask, so as to be met the position of the data of the querying condition;Again to described many Individual back end distributes the query task, according to the position of the data for meeting the querying condition in the plurality of data Data are read on node, when any one back end successful inquiring in the plurality of back end, Query Result is returned;
The real-time Transmission middleware, for according to preset polling cycle poll inquiry result list, if the inquiry knot Fruit catalogue is not sky, then read the Query Result file in the Query Result catalogue and return to client;
The client, for getting the Query Result file in real time by the real-time Transmission middleware.
2. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the control Node processed, creating in real time specifically for the structure according to B+ trees needs multiple data of warehouse-in to distinguish corresponding index;And,
The control node, specifically for being looked into by the B+ trees according to the index file for filtering out and the querying condition The plurality of back end is ask, so as to be met the position of the data of the querying condition.
3. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the visitor Family end, is additionally operable to continue to send data acquisition continuation request to the HDFS;
The HDFS, is additionally operable to continue to ask according to acquisition by the way that control node response is described, from the plurality of back end Middle startup multithreading obtains the data acquisition to be continued to ask corresponding Query Result;
The real-time Transmission middleware, is additionally operable to read the data acquisition continuation request correspondence according to preset polling cycle Query Result, and return to client.
4. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the visitor Family end, is additionally operable to the real-time Transmission middleware and sends ending request;
The real-time Transmission middleware, is additionally operable to receive after the ending request that the client sends, and stops poll inquiry Result list.
5. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the control Node processed, specifically for being met the position of the data of the querying condition after, meet the querying condition according to described The position of data determine the internet protocol address and side-play amount of target data place back end, according to the IP The back end for storing the target data is found, further according to the side-play amount from the back end for storing the target data Find the target data.
6. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that described to look into Time conditions are ask, including:Inquiry time started and poll-final time.
7. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the reality When transmit middleware, specially:Using jetty as network web container.
CN201611255956.1A 2016-12-30 2016-12-30 A large data real-time processing system based on Hadoop Pending CN106649847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611255956.1A CN106649847A (en) 2016-12-30 2016-12-30 A large data real-time processing system based on Hadoop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611255956.1A CN106649847A (en) 2016-12-30 2016-12-30 A large data real-time processing system based on Hadoop

Publications (1)

Publication Number Publication Date
CN106649847A true CN106649847A (en) 2017-05-10

Family

ID=58837696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611255956.1A Pending CN106649847A (en) 2016-12-30 2016-12-30 A large data real-time processing system based on Hadoop

Country Status (1)

Country Link
CN (1) CN106649847A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107436923A (en) * 2017-07-07 2017-12-05 北京奇虎科技有限公司 A kind of method and apparatus of the search index in big data cluster
CN109302497A (en) * 2018-11-29 2019-02-01 北京京东尚科信息技术有限公司 Data processing method, access agent device and system based on HADOOP
CN110209853A (en) * 2019-06-14 2019-09-06 重庆紫光华山智安科技有限公司 Image searching method, device and the equipment of vehicle

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841944A (en) * 2012-08-27 2012-12-26 南京云创存储科技有限公司 Method achieving real-time processing of big data
US8775425B2 (en) * 2010-08-24 2014-07-08 International Business Machines Corporation Systems and methods for massive structured data management over cloud aware distributed file system
CN104199919A (en) * 2014-09-01 2014-12-10 江苏惠网信息技术有限公司 Method for achieving real-time reading of super-large-scale data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8775425B2 (en) * 2010-08-24 2014-07-08 International Business Machines Corporation Systems and methods for massive structured data management over cloud aware distributed file system
CN102841944A (en) * 2012-08-27 2012-12-26 南京云创存储科技有限公司 Method achieving real-time processing of big data
CN104199919A (en) * 2014-09-01 2014-12-10 江苏惠网信息技术有限公司 Method for achieving real-time reading of super-large-scale data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107436923A (en) * 2017-07-07 2017-12-05 北京奇虎科技有限公司 A kind of method and apparatus of the search index in big data cluster
CN109302497A (en) * 2018-11-29 2019-02-01 北京京东尚科信息技术有限公司 Data processing method, access agent device and system based on HADOOP
CN110209853A (en) * 2019-06-14 2019-09-06 重庆紫光华山智安科技有限公司 Image searching method, device and the equipment of vehicle

Similar Documents

Publication Publication Date Title
US20210103604A1 (en) System and method for implementing a scalable data storage service
US10387402B2 (en) System and method for conditionally updating an item with attribute granularity
US9053167B1 (en) Storage device selection for database partition replicas
US9489443B1 (en) Scheduling of splits and moves of database partitions
Das et al. Big data analytics: A framework for unstructured data analysis
US9052831B1 (en) System and method for performing live partitioning in a data store
US9372911B2 (en) System and method for performing replica copying using a physical copy mechanism
US8819027B1 (en) System and method for partitioning and indexing table data using a composite primary key
US11609697B2 (en) System and method for providing a committed throughput level in a data store
US20140244585A1 (en) Database system providing single-tenant and multi-tenant environments
CN103294786A (en) Metadata organization and management method and system of distributed file system
CN107045422A (en) Distributed storage method and equipment
CN105025053A (en) Distributed file upload method based on cloud storage technology and system
CN103634361B (en) The method and apparatus for downloading file
US10146814B1 (en) Recommending provisioned throughput capacity for generating a secondary index for an online table
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
CN102750300B (en) High-performance unstructured data access protocol supporting multi-granularity searching.
US11573971B1 (en) Search and data analysis collaboration system
US9983823B1 (en) Pre-forking replicas for efficient scaling of a distribued data storage system
US10069909B1 (en) Dynamic parallel save streams for block level backups
CN107203532A (en) Construction method, the implementation method of search and the device of directory system
US9875270B1 (en) Locking item ranges for creating a secondary index from an online table
CN106599111A (en) Data management method and storage system
CN109766206A (en) A kind of log collection method and system
CN106156319A (en) Telescopic distributed resource description framework data storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510