CN106649847A - A large data real-time processing system based on Hadoop - Google Patents
A large data real-time processing system based on Hadoop Download PDFInfo
- Publication number
- CN106649847A CN106649847A CN201611255956.1A CN201611255956A CN106649847A CN 106649847 A CN106649847 A CN 106649847A CN 201611255956 A CN201611255956 A CN 201611255956A CN 106649847 A CN106649847 A CN 106649847A
- Authority
- CN
- China
- Prior art keywords
- data
- query
- real
- time
- hadoop
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Library & Information Science (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a large data real-time processing system based on Hadoop. In the embodiment of the large data real-time processing system based on Hadoop, filtration and index can be carried out while the query task is created, and the filtered index files is distributed to the datanode while filtration and index is carried out; the query of the local file is achieved by datanode at the same time, and the query result is returned to the client. And when any datanode query in the embodiment of the large data real-time processing system based on Hadoop is achieved, the query result can be returned quickly to the client through the periodic polling mechanism of the real-time transport middleware. In the embodiment of the large data real-time processing system based on Hadoop, the data query processing process in HDFS is concurrently executed, which makes maximum use of the hardware device of a computer, enables query to be achieved in real-time, and greatly improves the efficiency of query. When users execute query operations, the query results can be obtained, which improves the efficiency of data query and enables the query requests of clients to be quickly responded.
Description
Technical field
The present invention relates to field of computer technology, and in particular to a kind of big data real time processing system based on Hadoop.
Background technology
With informationalized development, enterprise's data to be processed are in explosive growth, and data volume has all reached super large rule
Mould (such as from TB levels to PB levels), thus brings a series of problem.Data volume increases, and the load of system is increasing,
The warehouse-in and query performance of data declines therewith.In the case where hardware cost is not increased, the maximum performance of system how is played,
The fastest of warehouse-in and inquiry is made, is the difficult problem that many enterprises face.
Cloud computing appear as mass data processing provide efficiently solve approach, in common cloud computing solution
Middle to exist based on the Frame Design of Hadoop, Hadoop includes:Distributed file system (Hadoop Distributed
File System, HDFS) and MapReduce.HDFS provides storage for the data of magnanimity, and MapReduce is the data of magnanimity
There is provided calculating.Mass data storage can easily be realized by the HDFS of Hadoop, while effectively preventing Single Point of Faliure, kept away
Exempt from unnecessary loss.But, when carrying out data retrieval on HDFS, conventional method is to open global search MapReduce,
The concurrent operation for carrying out large-scale data is needed, this needs all data stored on time HDFS of full filter.In cloud meter
In calculation, especially in the case of mass data, carrying out global search using MapReduce on HDFS in prior art can be to being
System resource causes huge waste, takes a substantial amount of time.
The content of the invention
It is an object of the invention to provide a kind of big data real time processing system based on Hadoop, looks into for improving data
The efficiency of inquiry, the inquiry request of quick response client.
In order to achieve the above object, the present invention is using such following technical scheme:
The present invention provides a kind of big data real time processing system based on Hadoop, the big data reality based on Hadoop
When processing system include:Client, real-time Transmission middleware, distributed file system HDFS, wherein,
The HDFS includes:Control node namenode and multiple back end datanode;
The control node, for starting multithreading on the plurality of back end, create needs warehouse-in in real time
Multiple data distinguish corresponding index, and multiple indexes are stored in multiple index files according to creation time;
The client, for sending data acquisition get requests to the HDFS by the real-time Transmission middleware;
The real-time Transmission middleware, for the data acquisition request that the client sends to be transmitted into the control section
Point;
The control node, the data acquisition request for being sent according to the client creates query task, described to look into
Inquiry task includes:The querying condition that target data is met, the querying condition includes:Query time condition;Looked into according to described
Query time condition and the plurality of index file in inquiry condition is matched, and is filtered out and is met the query time condition
Index condition;The query task is distributed on the plurality of back end, according to the index file for filtering out and institute
State querying condition and inquire about the plurality of back end, so as to be met the position of the data of the querying condition;Again to institute
State multiple back end and distribute the query task, according to the position of the data for meeting the querying condition the plurality of
Data are read on back end, when any one back end successful inquiring in the plurality of back end, inquiry knot is returned
Really;
The real-time Transmission middleware, for according to preset polling cycle poll inquiry result list, if described look into
It is not sky to ask result list, then read the Query Result file in the Query Result catalogue and return to client;
The client, for getting the Query Result file in real time by the real-time Transmission middleware.
After above-mentioned technical proposal, the technical scheme that the present invention is provided will have the following advantages:
In big data real time processing system based on Hadoop provided in an embodiment of the present invention, it is possible to achieve to big data reality
When process, the warehouse-in of data can be realized in big data real time processing system, inquire about, transmission is all concurrent, and is real-time
's.In the embodiment of the present invention while query task is created, filtration index is carried out, will can have been filtered while filtering index
Index file be distributed to above datanode, while datanode completes the inquiry of local file, and return to client and look into
Ask result.And the inquiry of any datanode is completed in the embodiment of the present invention, the week of real-time Transmission middleware can be passed through
Phase polling mechanism quickly returns Query Result to client.In the embodiment of the present invention, the data query in HDFS was processed
Journey is all concurrently performed, and the hardware device of computer is make use of to greatest extent, has been reached inquiry and is completed in real time, greatly
The efficiency of inquiry is improve, user just can obtain Query Result when performing inquiry operation, improve the efficiency of data query, it is quick to ring
Answer the inquiry request of client.
Description of the drawings
Fig. 1 provides a kind of composition structural representation of the big data real time processing system based on Hadoop for the embodiment of the present invention
Figure;
Fig. 2 is provided based on the querying flow schematic diagram of jetty for the embodiment of the present invention.
Specific embodiment
A kind of big data real time processing system based on Hadoop is embodiments provided, for improving data query
Efficiency, the inquiry request of quick response client.
To enable goal of the invention, feature, the advantage of the present invention more obvious and understandable, below in conjunction with the present invention
Accompanying drawing in embodiment, is clearly and completely described, it is clear that disclosed below to the technical scheme in the embodiment of the present invention
Embodiment be only a part of embodiment of the invention, and not all embodiments.Based on the embodiment in the present invention, this area
The every other embodiment that technical staff is obtained, belongs to the scope of protection of the invention.
Term " comprising " and " having " in description and claims of this specification and above-mentioned accompanying drawing and they
Any deformation, it is intended that cover it is non-exclusive includes, so as to include a series of units process, method, system, product or set
It is standby to be not necessarily limited to those units, but may include clearly not list or for these processes, method, product or equipment are solid
Other units having.
It is described in detail individually below.
The one embodiment of the present invention based on the big data real time processing system of Hadoop, it is possible to achieve in distributed system
The quick real-time query of data is completed in architecture.The embodiment of the present invention can overcome cloud computing solution party of the prior art
Frequently-used data processing method can cause system resource waste, the shortcoming of data processing time length in case, there is provided a kind of effective sea
Amount Real-time Data Processing Method.The warehouse-in of data in the embodiment of the present invention, inquiry, transmission is all concurrent, real-time.Refer to
Shown in Fig. 1, the big data real time processing system based on Hadoop that the present invention is provided, including:In the middle of client, real-time Transmission
Part, distributed file system (Hadoop Distributed File System, HDFS), wherein,
HDFS includes:Control node (namenode) and multiple back end (datanode);
Control node, for starting multithreading on multiple back end, creates in real time the multiple data for needing warehouse-in
The corresponding index of difference, and multiple indexes are stored in multiple index files according to creation time;
Client, for sending data acquisition get requests to HDFS by real-time Transmission middleware;
Real-time Transmission middleware, the data acquisition request for client to be sent is transmitted to control node;
Control node, the data acquisition request for being sent according to client creates query task, and query task includes:Mesh
The querying condition that mark data are met, querying condition includes:Query time condition;Query time condition in querying condition
Matched with multiple index files, filtered out the index condition for meeting query time condition;Query task is distributed to multiple
On back end, multiple back end are inquired about according to the index file and querying condition that filter out, so as to be met inquiry bar
The position of the data of part;Distribute query task to multiple back end again, existed according to the position of the data for meeting querying condition
Data are read on multiple back end, when any one back end successful inquiring in multiple back end, inquiry knot is returned
Really;
Real-time Transmission middleware, for according to preset polling cycle poll inquiry result list, if Query Result mesh
Record is not sky, then read the Query Result file in Query Result catalogue and return to client;
Client, for getting Query Result file in real time by real-time Transmission middleware.
In the embodiment of the present invention, HDFS is realized based on Hadoop, the characteristics of HDFS has high fault tolerance, HDFS can provide height
Handling capacity (high throughput) carrys out the data of access application, and being adapted to those has super large data set (large data
Set application program), HDFS relaxes the requirement of POSIX, can in the form of streaming access (streaming access) file
Data in system.
Wherein, real-time Transmission middleware is arranged between client and HDFS, and the interaction between client and HDFS passes through
Real-time Transmission middleware completing, the forwarding of such as inquiry request and the forwarding of Query Result etc..In some enforcements of the present invention
In example, real-time Transmission middleware, specially:Using jetty as network (web) container.Jetty is one and increases income
Used as web container, it is that, based on the web container of Java, such as JSP and servlet provides running environment to servlet containers,
Servlet (server applet), full name Java Servlet, with the server of written in Java, its major function exists
In interactively browsing and changing data, dynamic web content is generated.
In embodiments of the present invention, control node and multiple back end are included in HDFS, back end is that data are deposited
Storage unit, the needs that physical layer interface is transmitted can be stored on back end, when client needs to read data from HDFS
When, read on back end that can be from HDFS.Control node in the embodiment of the present invention can be divided into main control node
With standby control node, so as to ensure that HDFS can timely respond to the request of client by active-standby mode.In the embodiment of the present invention,
It is independent that multiple back end in HDFS perform query task, and each back end query feedback is also independent realization, only
There is any back end inquiry to complete, can be by real-time Transmission middleware to client feedback Query Result, it is not necessary to
The inquiry of all back end finish after unified feedback Query Result again, therefore with very high efficiency data query.
In big data real time processing system based on Hadoop provided in an embodiment of the present invention, it is possible to achieve to big data reality
When process, the warehouse-in of data can be realized in big data real time processing system, inquire about, transmission is all concurrent, and is real-time
's.In the embodiment of the present invention while query task is created, filtration index is carried out, will can have been filtered while filtering index
Index file be distributed to above datanode, while datanode completes the inquiry of local file, and return to client and look into
Ask result.And the inquiry of any datanode is completed in the embodiment of the present invention, the week of real-time Transmission middleware can be passed through
Phase polling mechanism quickly returns Query Result to client.In the embodiment of the present invention, the data query in HDFS was processed
Journey is all concurrently performed, and the hardware device of computer is make use of to greatest extent, has been reached inquiry and is completed in real time, greatly
The efficiency of inquiry is improve, user just can obtain Query Result when performing inquiry operation, improve the efficiency of data query, it is quick to ring
Answer the inquiry request of client.
In some embodiments of the invention, control node, specifically for the structure according to B+ trees needs are created in real time
Multiple data of warehouse-in distinguish corresponding index;And,
Control node, specifically for passing through the multiple data sections of B+ tree queries according to the index file and querying condition that filter out
Point, so as to be met the position of the data of querying condition.
Wherein, the big data real time processing system in the embodiment of the present invention based on Hadoop can realize entering in real time for data
Storehouse.Based on existing HDFS, start multithreading on every datanode and create index, index file is created parallel.It is right
Some significant fields set up index, and with the structural generation of B+ trees, each new record only needs to be inserted into B+ trees.B+ trees
Insertion is carried out only on leaf node.Often insert the subtree number that will be judged after (key-pointer) index entry in node
Whether go beyond the scope.When the subtree number in inserting postjunction is more than m (exponent number of B+ trees), need for leaf node to be split into two
Individual node.The maximum key and node address of the two nodes should be simultaneously included in their parents' node.Hereafter, problem is returned
In inserting in non-leaf node.The insertion of key and the insertion of leaf node are similar in non-leaf node, in non-leaf node
The upper limit of subtree number be m, the super node split that also to carry out that goes beyond the scope.When root knot dot splitting is done, because without double
Close node, must just create new parents' node, used as the new root of tree.
In some embodiments of the invention, client, is additionally operable to continue to send data acquisition continuation request to HDFS;
HDFS, is additionally operable to be asked according to continuation is obtained by control node response, and from multiple back end multithreading is started
Obtain data acquisition to continue to ask corresponding Query Result;
Real-time Transmission middleware, is additionally operable to read data acquisition according to preset polling cycle and continues to ask corresponding to look into
Result is ask, and returns to client.
In the embodiment of the present invention, real-time Transmission middleware can use jetty as web container, and data are done on HDFS
While inquiry, jetty repeating query Query Result catalogues, if being not sky, read Query Result file and return to client.
Client continues to send data acquisition continuation (continue) request to HDFS ends, and control node starts multithreading reading inquiry and ties
Really, the Query Result for reading is returned to by client by jetty, if the reading data for returning are sky, flow process terminates, such as
Fruit is not sky, and client continues to send continue requests.In query script, any datanode successful inquirings, i.e., to client
End returned data, it is not necessary to which all datanode inquiries are completed.
As shown in Fig. 2 for the querying flow figure of jetty in inventive embodiments, using jetty as web container, first visitor
Family end to HDFS ends send get requests, and control node end parsing json goes here and there, and json is a kind of data interchange format of lightweight,
Querying condition instantiation job objects in json strings, submit to job to carry out distributed query, finally return to result.In HDFS
On while inquire about, jetty repeating query Query Result catalogues, if being not sky, read file and simultaneously return to client, client
End continues to send continue requests to HDFS ends, and control node end starts multithreading and reads Query Result, will read data and returns
Back to client.If Query Result is the inquiry on empty and HDFS be over, sky is returned, flow process terminates.Wherein
Any one step, returns if failure is produced, and for example extremely, request error, index file folder is situations such as exist.
In some embodiments of the invention, client, is additionally operable to real-time Transmission middleware and sends ending request;
Real-time Transmission middleware, after being additionally operable to the ending request for receiving client transmission, stops poll inquiry result
Catalogue.
Wherein, if client sends ending request to real-time Transmission middleware, real-time Transmission middleware is no longer to client
End returns Query Result, so as to realize that client is timely responded to, reduces the occupancy to transfer resource, improves resource and uses effect
Rate.
In some embodiments of the invention, control node, the position of the data specifically for being met querying condition
Afterwards, according to the position of the data for meeting querying condition determine target data place back end internet protocol address and
Side-play amount, according to IP the back end of storage target data is found, further according to side-play amount from the back end for storing target data
In find target data.
Big data real time processing system in the embodiment of the present invention based on Hadoop can realize real-time query:Using distribution
Formula computing system, creates at control node end and submits to query task (job) to be inquired about, and inquiry is divided into following process:First
Filtration is indexed in control node, because the title of index file was created according to the time, according in querying condition
The title of query time condition and index file is matched, and screening meets the index file of condition.Query task is distributed to
On every datanode, according to the index file and querying condition for filtering out by B+ tree queries, the data of condition are met
Position, the distribution of task is carried out again, data are read on every machine according to the position of data obtained in the previous step, and return
Return Query Result.Efficiently B+ structures and the executed in parallel of inquiry, have reached inquiry and complete in real time.Wherein, the position note of data
The IP address and side-play amount at data storage place machine (datanode) are recorded, machine has been found according to IP address, further according to skew
Amount can just find corresponding data.
In some embodiments of the invention, query time condition, including:Inquiry time started and poll-final time,
Wherein, there must be query time condition in querying condition, you can to obtain inquiry time started and the inquiry knot of client setting
The beam time, such that it is able to perform query task according to the data acquisition request of client, start according to the inquiry time started
Query task, according to the poll-final time query task is terminated.
All process in the embodiment of the present invention are all concurrently performed, and the hardware that computer is make use of to greatest extent sets
It is standby, drastically increase treatment effeciency.Make user just can obtain Query Result when performing inquiry operation.The present invention includes data
Real-time warehouse-in, real-time query, real-time results transmission, the warehouse-in of data, inquiry, transmission is all concurrent, real-time.It is of the invention real
During the big data real time processing system of example offer is provided, the warehouse-in of data, inquiry, it is all concurrent to transmit, in real time.Appoint creating
While business, filtration index is carried out, while filtering index, the index file for having filtered is distributed to above datanode, together
When datanode complete the inquiry of local file, and to client returned data.The inquiry of any datanode is completed, i.e., to
Family returns Query Result.The inventive method processing procedure is all concurrently performed, and the hardware of computer is make use of to greatest extent
The executed in parallel of equipment, efficient B+ structures and inquiry, has reached inquiry and completes in real time, drastically increases the efficiency of inquiry,
User just can obtain Query Result when performing inquiry operation.
In addition it should be noted that, device embodiment described above is only schematic, wherein described as separating
The unit of part description can be or may not be it is physically separate, can be as the part that unit shows or
Can not be physical location, you can be located at a place, or can also be distributed on multiple NEs.Can be according to reality
The purpose for needing to select some or all of module therein to realize this embodiment scheme on border.In addition, what the present invention was provided
In device embodiment accompanying drawing, the annexation between module is represented and have between them communication connection, specifically can be implemented as one
Bar or a plurality of communication bus or holding wire.Those of ordinary skill in the art are not in the case where creative work is paid, you can with
Understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can be borrowed
Software is helped to add the mode of required common hardware to realize, naturally it is also possible to include special IC, specially by specialized hardware
Realized with CPU, private memory, special components and parts etc..Generally, all functions of being completed by computer program can
Easily realized with corresponding hardware, and, for realizing that the particular hardware structure of same function can also be various many
Sample, such as analog circuit, digital circuit or special circuit etc..But, it is more for the purpose of the present invention in the case of software program reality
It is now more preferably embodiment.Based on such understanding, technical scheme is substantially made to prior art in other words
The part of contribution can be embodied in the form of software product, and the computer software product is stored in the storage medium that can read
In, such as floppy disk, USB flash disk, portable hard drive, read-only storage (ROM, Read-Only Memory), the random access memory of computer
Device (RAM, Random Access Memory), magnetic disc or CD etc., including some instructions use is so that a computer sets
Standby (can be personal computer, server, or network equipment etc.) performs the method described in each embodiment of the invention.
In sum, above example is only to illustrate technical scheme, rather than a limitation;Although with reference to upper
State embodiment to be described in detail the present invention, it will be understood by those within the art that:It still can be to upper
State the technical scheme described in each embodiment to modify, or equivalent is carried out to which part technical characteristic;And these
Modification is replaced, and does not make the spirit and scope of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution.
Claims (7)
1. a kind of big data real time processing system based on Hadoop, it is characterised in that the big data reality based on Hadoop
When processing system include:Client, real-time Transmission middleware, distributed file system HDFS, wherein,
The HDFS includes:Control node namenode and multiple back end datanode;
The control node, for starting multithreading on the plurality of back end, create needs the multiple of warehouse-in in real time
Data distinguish corresponding index, and multiple indexes are stored in multiple index files according to creation time;
The client, for sending data acquisition get requests to the HDFS by the real-time Transmission middleware;
The real-time Transmission middleware, for the data acquisition request that the client sends to be transmitted into the control node;
The control node, the data acquisition request for being sent according to the client creates query task, and the inquiry is appointed
Business includes:The querying condition that target data is met, the querying condition includes:Query time condition;According to the inquiry bar
Query time condition and the plurality of index file in part is matched, and filters out the index for meeting the query time condition
Condition;The query task is distributed on the plurality of back end, according to the index file for filtering out and described is looked into
The plurality of back end of condition query is ask, so as to be met the position of the data of the querying condition;Again to described many
Individual back end distributes the query task, according to the position of the data for meeting the querying condition in the plurality of data
Data are read on node, when any one back end successful inquiring in the plurality of back end, Query Result is returned;
The real-time Transmission middleware, for according to preset polling cycle poll inquiry result list, if the inquiry knot
Fruit catalogue is not sky, then read the Query Result file in the Query Result catalogue and return to client;
The client, for getting the Query Result file in real time by the real-time Transmission middleware.
2. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the control
Node processed, creating in real time specifically for the structure according to B+ trees needs multiple data of warehouse-in to distinguish corresponding index;And,
The control node, specifically for being looked into by the B+ trees according to the index file for filtering out and the querying condition
The plurality of back end is ask, so as to be met the position of the data of the querying condition.
3. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the visitor
Family end, is additionally operable to continue to send data acquisition continuation request to the HDFS;
The HDFS, is additionally operable to continue to ask according to acquisition by the way that control node response is described, from the plurality of back end
Middle startup multithreading obtains the data acquisition to be continued to ask corresponding Query Result;
The real-time Transmission middleware, is additionally operable to read the data acquisition continuation request correspondence according to preset polling cycle
Query Result, and return to client.
4. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the visitor
Family end, is additionally operable to the real-time Transmission middleware and sends ending request;
The real-time Transmission middleware, is additionally operable to receive after the ending request that the client sends, and stops poll inquiry
Result list.
5. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the control
Node processed, specifically for being met the position of the data of the querying condition after, meet the querying condition according to described
The position of data determine the internet protocol address and side-play amount of target data place back end, according to the IP
The back end for storing the target data is found, further according to the side-play amount from the back end for storing the target data
Find the target data.
6. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that described to look into
Time conditions are ask, including:Inquiry time started and poll-final time.
7. a kind of big data real time processing system based on Hadoop according to claim 1, it is characterised in that the reality
When transmit middleware, specially:Using jetty as network web container.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611255956.1A CN106649847A (en) | 2016-12-30 | 2016-12-30 | A large data real-time processing system based on Hadoop |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611255956.1A CN106649847A (en) | 2016-12-30 | 2016-12-30 | A large data real-time processing system based on Hadoop |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649847A true CN106649847A (en) | 2017-05-10 |
Family
ID=58837696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611255956.1A Pending CN106649847A (en) | 2016-12-30 | 2016-12-30 | A large data real-time processing system based on Hadoop |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649847A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107436923A (en) * | 2017-07-07 | 2017-12-05 | 北京奇虎科技有限公司 | A kind of method and apparatus of the search index in big data cluster |
CN109302497A (en) * | 2018-11-29 | 2019-02-01 | 北京京东尚科信息技术有限公司 | Data processing method, access agent device and system based on HADOOP |
CN110209853A (en) * | 2019-06-14 | 2019-09-06 | 重庆紫光华山智安科技有限公司 | Image searching method, device and the equipment of vehicle |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102841944A (en) * | 2012-08-27 | 2012-12-26 | 南京云创存储科技有限公司 | Method achieving real-time processing of big data |
US8775425B2 (en) * | 2010-08-24 | 2014-07-08 | International Business Machines Corporation | Systems and methods for massive structured data management over cloud aware distributed file system |
CN104199919A (en) * | 2014-09-01 | 2014-12-10 | 江苏惠网信息技术有限公司 | Method for achieving real-time reading of super-large-scale data |
-
2016
- 2016-12-30 CN CN201611255956.1A patent/CN106649847A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8775425B2 (en) * | 2010-08-24 | 2014-07-08 | International Business Machines Corporation | Systems and methods for massive structured data management over cloud aware distributed file system |
CN102841944A (en) * | 2012-08-27 | 2012-12-26 | 南京云创存储科技有限公司 | Method achieving real-time processing of big data |
CN104199919A (en) * | 2014-09-01 | 2014-12-10 | 江苏惠网信息技术有限公司 | Method for achieving real-time reading of super-large-scale data |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107436923A (en) * | 2017-07-07 | 2017-12-05 | 北京奇虎科技有限公司 | A kind of method and apparatus of the search index in big data cluster |
CN109302497A (en) * | 2018-11-29 | 2019-02-01 | 北京京东尚科信息技术有限公司 | Data processing method, access agent device and system based on HADOOP |
CN110209853A (en) * | 2019-06-14 | 2019-09-06 | 重庆紫光华山智安科技有限公司 | Image searching method, device and the equipment of vehicle |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210103604A1 (en) | System and method for implementing a scalable data storage service | |
US10387402B2 (en) | System and method for conditionally updating an item with attribute granularity | |
US9053167B1 (en) | Storage device selection for database partition replicas | |
US9489443B1 (en) | Scheduling of splits and moves of database partitions | |
Das et al. | Big data analytics: A framework for unstructured data analysis | |
US9052831B1 (en) | System and method for performing live partitioning in a data store | |
US9372911B2 (en) | System and method for performing replica copying using a physical copy mechanism | |
US8819027B1 (en) | System and method for partitioning and indexing table data using a composite primary key | |
US11609697B2 (en) | System and method for providing a committed throughput level in a data store | |
US20140244585A1 (en) | Database system providing single-tenant and multi-tenant environments | |
CN103294786A (en) | Metadata organization and management method and system of distributed file system | |
CN107045422A (en) | Distributed storage method and equipment | |
CN105025053A (en) | Distributed file upload method based on cloud storage technology and system | |
CN103634361B (en) | The method and apparatus for downloading file | |
US10146814B1 (en) | Recommending provisioned throughput capacity for generating a secondary index for an online table | |
CN111221791A (en) | Method for importing multi-source heterogeneous data into data lake | |
CN102750300B (en) | High-performance unstructured data access protocol supporting multi-granularity searching. | |
US11573971B1 (en) | Search and data analysis collaboration system | |
US9983823B1 (en) | Pre-forking replicas for efficient scaling of a distribued data storage system | |
US10069909B1 (en) | Dynamic parallel save streams for block level backups | |
CN107203532A (en) | Construction method, the implementation method of search and the device of directory system | |
US9875270B1 (en) | Locking item ranges for creating a secondary index from an online table | |
CN106599111A (en) | Data management method and storage system | |
CN109766206A (en) | A kind of log collection method and system | |
CN106156319A (en) | Telescopic distributed resource description framework data storage method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |