CN103853766B - A kind of on-line processing method and system towards stream data - Google Patents
A kind of on-line processing method and system towards stream data Download PDFInfo
- Publication number
- CN103853766B CN103853766B CN201210510056.2A CN201210510056A CN103853766B CN 103853766 B CN103853766 B CN 103853766B CN 201210510056 A CN201210510056 A CN 201210510056A CN 103853766 B CN103853766 B CN 103853766B
- Authority
- CN
- China
- Prior art keywords
- stream data
- data
- memory cache
- cache layer
- analysis program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a kind of on-line processing method towards stream data, including:Step 1, sets up online memory cache layer, is stored in the online memory cache layer after carrying out attribute extraction according to key value structure to the stream data;Step 2, sets up hybrid index structure to the stream data in the memory cache layer;Step 3, every stream data to establishing index structure increase an access flag, and this flag bit is used to indicate different analysis programs for the registration scenarios of the stream data, while recording to the state that each analysis program accesses stream data.Step 4, data scrubbing, if certain stream data by the memory cache layer in all analysis programs specified accessed, the stream data is carried out into cleaning operation.The present invention significantly reduces the reading and writing data pressure during Stream Processing, can effectively alleviate the pressure of database in extensive stream data processing system, and can lift the real-time processing speed of stream data.
Description
Technical field
The present invention relates to large-scale data is processed, particularly with regard to a kind of on-line processing method towards stream data and
System.
Background technology
It is with the progress and expanding economy in epoch, increasing to the demand of information in people's daily life, especially
It is becoming increasingly popular with internet, the information for having magnanimity daily is issued on the internet and propagated.In 2011, analysis was adjusted
Grind mechanism IDC to issue《Value is extracted from chaos》.This report shows that global information total amount often spends 2 years, will increase
One times.2011, the global data total amount for being created and being replicated was 1.8ZB.For example, 1.8ZB equivalent to the whole world each
People does the data total amount produced by 2.15 hundred million high-resolution nuclear magnetic resonance checks daily.
The task of large-scale data analysis process system is exactly that mass data is processed, and the analysis from mass data is dug
Excavate valuable knowledge.Common data handling system needs collection to be stored from the data of each data source, then
Data are being read from data storage device, is being analyzed and is processed.A kind of framework of conventional data analysis processing system is to set
Vertical central database is realizing the storage and reading of data.News, forum are directed to from internet, are won by capture program first
The data of the different classifications such as visitor, microblogging, social networks, search engine are acquired and are written in central database;Then,
Various analysis programs read data from database, carry out follow-up data analysis and process.Central database assume responsibility for simultaneously
The write of data and reading task.
System architecture with database as storage center has widely been accepted and has been applied.But in mass data ring
Under border, with increase, the growth of derived data amount and the increase of applied analysis program number purpose of data source species, centre data
The problem of storehouse framework is increasingly highlighted.The shortcoming of central database framework has been mainly reflected in three aspects:First real-time responsiveness
Can decline;More than second database interaction;3rd data processing time delay.
It is with the increase of data source, the increase of data volume and the increase of number of applications, traditional based on middle calculation
Shortcoming according to the Data Management Analysis system of the framework in storehouse is increasingly highlighted.So, a kind of new data processing architecture need be proposed
To cause problem above effectively to be alleviated.
Under normal circumstances, for the resolving ideas of this problem can be summarized as following four:
Message-oriented middleware method.Message-oriented middleware is a kind of centre being made up of message transfer mechanism or SMS queue's pattern
Part technology.Message can be sent to each application program by message-oriented middleware, can be alleviated by using message-oriented middleware
The read-write pressure of data, at the same can in the message between application program is controlled in part for the access of message.Message-oriented middleware exists
Important function has been played in many sector applications.In the demand of enterprise-level application, message transmission needs to ensure reliability and safety
Property, but, excessively pay close attention to reliability and security increased the time of data processing and the time delay of data transfer, be not suitable for big rule
The requirement of the handling capacity of mould data processing.
Distributed Message Queue method.Increasing company and research institution attempt using based on distributed towards disappearing
Alleviating the problem brought by central database framework, these distributed message queues great majority are all with item of increasing income for the system of breath
Purpose form is issued.Distributed message handling system can be under efficient process mass data environment messenger service.But this
Kind distributed message handling system has two, and one is that these systems are all based on the mode of major key inquiry to carry out
The read-write of data, it is impossible to according to the inquiry of some critical field, it is impossible to replace the query function of relevant database completely;Two
It is distributed message handling system to ensure high-throughput, it is impossible to the fine integrality and security that must ensure data.
Caching method.In Computer Architecture for the read or write speed of internal memory be 10 times of disk read-write speed with
On, so in order to avoid frequently data base read-write, just someone employs the thought of caching, is opened up in one piece outside database
Deposit as data buffer zone, mitigate database loads with this, improve data access speed.This caching based on internal memory is still
There are problems that two, one is efficiency when cannot optimize data write into Databasce;Two is based on key assignments(Key-Value)The number of tissue
According to, it is impossible to interval query operation is carried out for some specific field.
Internal memory database method.In Web applications, for example user accesses, and user clicks on, and these data are arrived in streaming
Reach, so research becomes academia and industrial quarters is all extremely paid close attention to asks for the processing method of the online data of stream data
Topic.The research branch that another online data is processed is the research and development of memory database.Memory database, as the term suggests
Data are exactly placed on the database operated in internal memory.Relative to disk, the reading and writing data speed of internal memory will be higher by several quantity
Level, compares in saving the data in internal memory and the performance that can be greatly enhanced application is accessed from disk.Meanwhile, memory database
The traditional approach of data in magnetic disk management is abandoned, architecture has all been redesigned in internal memory based on total data, and
It has been also carried out being correspondingly improved in terms of data buffer storage, fast algorithm, parallel work-flow, so data processing speed compares traditional database
Data processing speed it is many soon, typically all more than 10 times.The maximum feature of memory database is its " primary copy " or " work
Make version " memory-resident, i.e. active transaction only come into contacts with the memory copying of real-time internal memory database.Redis maximum shortcoming
Be it is not fine must solve the problems, such as data, services reliability, all of data are all stored in the memory headroom of user's application
Interior, once process is restarted, or exception is exited, and will result in loss of data.But which cannot meet the different words according to data
The demand of Duan Jinhang inquiries.
In sum, alleviate the ability of data access pressure in prior art, limited by various different factors, it is impossible to meet
Actual demand.
The content of the invention
The purpose of the present invention is:An inline cache layer based on internal memory is introduced, the characteristics of for stream data, will be original
For a large amount of read-write pressure of database are transferred in inline cache layer, so as to during significantly reducing Stream Processing, data are read
Pressure is write, effectively alleviates the pressure of database in extensive stream data processing system, lift the real-time processing speed of stream data
Degree.
For achieving the above object, the present invention proposes a kind of on-line processing method towards stream data, including:
Step 1, sets up online memory cache layer, and the stream data is carried out storing after attribute extraction according to key value structure
In the online memory cache layer;
Step 2, sets up hybrid index structure to the stream data in the memory cache layer;
Step 3, every stream data to establishing index structure increase an access flag, and this flag bit is used to mark
Will difference analysis program is for the registration scenarios of the stream data;Access the state of stream data simultaneously to each analysis program
Recorded;
Step 4, data scrubbing, if certain stream data by the memory cache layer in all analysis programs for specifying access
Cross, then the stream data is carried out into cleaning operation.
The on-line processing method also includes:After certain analysis program reads stream data from the memory cache layer,
Check the access flag of the stream data:
If the stream data was accessed by the analysis program, it is to have read flag bit, then not by the stream data
Return the analysis program;
If the stream data was not accessed by the analysis program, it is not read flag bit, then the stream data is returned
Back to the analysis program, and the flag bit of the stream data is arranged to read flag bit.
The on-line processing method also includes:After reading stream data, the access flag of the stream data is checked:
If the stream data was accessed by the analysis program of all registrations, by the stream data from memory cache layer
Remove;
Whether the residence time for otherwise inquiring about the stream data exceedes threshold value, and analysis is continued waiting for if not less than the threshold value
The stream data is removed from memory cache layer if more than the threshold value by the access of program.
The mode of setting up of the key value structure in the step 1 is:For each stream data, memory cache layer will be which
Unique No. ID key as record of distribution one, all properties information of the key assignments corresponding to the stream data.The step
Hybrid index structure described in rapid 2 is combined foundation according to key value structure, B+ trees index structure and Hash Index Structure.
The step 2 includes:
Judge whether the stream data in the inline cache layer is needed by Field Inquiry:
If desired press Field Inquiry:If necessary to carry out interval query according to current attribute, to this Building Attribute Field B+
Tree index structure, if necessary to carry out major key inquiry according to current attribute, then to this Building Attribute Field Hash Index Structure;
If need not be by Field Inquiry, need not be to this Building Attribute Field index structure.
In the step 3:The access flag is 32 integer numerals, each bit of each integer numeral
Position can represent an analysis program for the access state of stream data, when the stream data in internal memory is initialized,
Each bit of the access flag of every stream data is 0;
When analysis program is registered to internal memory cache layer, the memory cache layer is its one access flag of distribution
Position, after certain analysis program accesses a stream data, the memory cache layer is by the access flag of the stream data
Digitwise operation is carried out with the access identities of the analysis program, and using the result after calculating as the current access mark of the stream data
Will position.
In the step 4:
After reading stream data, the access flag of the stream data is checked:
If the stream data was accessed by the analysis program of all registrations, by the stream data from memory cache layer
Remove;
Otherwise inquire about whether the stream data exceedes threshold value, the visit of analysis program is continued waiting for if not less than the threshold value
Ask, the stream data is removed from memory cache layer if more than the threshold value.
For achieving the above object, the present invention also provides a kind of Online Processing System towards stream data, including:
Online memory cache layer building module, for setting up online memory cache layer, carries out attribute to the stream data
It is stored in the online memory cache layer according to key value structure after extraction;
Hybrid index structure sets up module, for setting up hybrid index to the stream data in the memory cache layer
Structure;
Access flag builds module, increases an access flag for every stream data to establishing index structure
Position, this flag bit are used to indicate different analysis programs for the registration scenarios of the stream data, while to each analysis program
The state for accessing stream data is recorded;
Internal memory stream data cleaning modul, for accessing to all analysis programs specified in by the memory cache layer
The stream data crossed, carries out cleaning operation.
The Online Processing System also includes:
Stream data exits return module, for reading after stream data, checks the access flag of the stream data:
If the stream data analyzed routine access mistake, is to have read flag bit, then the stream data is not returned
Analysis program;If the stream data does not have analyzed routine access mistake, it is not read flag bit, then by the mark of the stream data
Position is arranged to read flag bit, and returns the stream data to analysis program.
In the internal memory stream data cleaning modul:
After analysis program reads stream data from the memory cache layer, the access flag of the stream data is checked
Position:It is if the stream data was accessed by all registered analysis programs, the stream data is clear from memory cache layer
Except the stream data;Whether the residence time for otherwise inquiring about the stream data exceedes threshold value, continues if not less than the threshold value
The stream data is removed the stream data from memory cache layer if more than the threshold value by the access of program to be analyzed.
The beneficial effects of the present invention is:The on-line processing method and system towards stream data of the present invention is by increasing
Data buffer storage based on internal memory, the characteristics of for stream data, a large amount of read-write pressure originally for database is transferred to
The pressure of database in extensive stream data processing system in inline cache layer, is effectively alleviated, streaming number is greatly reduced
According to read-write pressure, improve stream data real-time processing speed and data handling system it is ageing.
Describe the present invention below in conjunction with the drawings and specific embodiments, but it is not as a limitation of the invention.
Description of the drawings
Fig. 1 is the on-line processing method flow chart towards stream data of the present invention;
Fig. 2 is the Online Processing System schematic diagram towards stream data of the present invention.
Specific embodiment
The core concept of the present invention is an inline cache layer based on internal memory to be introduced on original framework, for stream
The characteristics of formula data, for a large amount of read-write pressure of database, will be transferred in inline cache, and efficient must can carry originally
For data, services.
Fig. 1 is the on-line processing method flow chart towards stream data of the present invention.As shown in figure 1, the method includes:
Step 1, sets up online memory cache layer, and the stream data is carried out storing after attribute extraction according to key value structure
In the online memory cache layer.
Step 2, sets up hybrid index structure to the stream data in the memory cache layer.
Step 3, every stream data to establishing index structure increase an access flag, and this flag bit is used to mark
Will difference analysis program is for the registration scenarios of the stream data;Access the state of stream data simultaneously to each analysis program
Recorded.
Stream data is that dynamic is present, and for every stream data, what which can be accessed by which analysis program is certain
's.
Step 4, data scrubbing, if certain stream data by the memory cache layer in all analysis programs for specifying access
Cross, then the stream data is carried out into cleaning operation.
The mode of setting up of the key value structure in the step 1 is:For each stream data, memory cache layer will be which
Unique No. ID key as record of distribution one, all properties information of the key assignments corresponding to the stream data.Original
On the basis of based on central database framework, an online memory cache layer is increased.The memory cache layer of increase is based on interior
The management of row stream data is deposited into, and reading and writing data service is externally provided by network interface.The increase of memory cache layer is right
Adjusted in the data flow of data handling system.On the one hand, the stream data for collecting is written to interior by capture program
Deposit in caching, analysis program reads stream data from memory cache, carries out data analysis.On the other hand, memory cache will be fixed
Phase is written to the stream data in internal memory in database and carries out persistent storage.
In online memory cache, each stream data organizes storage according to the mode of key assignments.For each streaming number
According to memory cache will distribute one globally unique No. ID key as record for which, and followed by key storage is the institute of record
There is the information of attribute.All of stream data is stored in key assignments mode, and by the key of stream data come unique mark
One record.On the basis of based on key assignments storage, the present invention sets up many index structures of mixing for stream data, for per bar
The different field of stream data sets up different types of index structure.For the stream data of storage, some inquiries need by
The inquiry of uniqueness is carried out according to attribute field, some inquiries need to be inquired about according to the interval of field.Need for there is uniqueness
These fields are set up hash index in internal memory by the inquiry asked.Set up using uniqueness field as the index value of hash index
Hash index, carry out in Hash Index Structure uniqueness inquire about when, under best-case can with O (1) (i.e. constant) when
Between complexity carry out the inquiry of stream data.For the attribute field for having interval query demand, these fields are built in internal memory
Vertical B+ trees index.The interval query carried out by B+ trees index structure can be with O's (logn) (i.e. logarithm) under average case
Complete in time complexity.
The on-line processing method also includes dynamic registration step:
After certain analysis program reads stream data from the memory cache layer, the access mark of the stream data is checked
Will position:
If the stream data was accessed by the analysis program, it is to have read flag bit, then not by the stream data
Return the analysis program;
If the stream data was not accessed by the analysis program, it is not read flag bit, then the stream data is returned
Back to the analysis program, and the flag bit of the stream data is arranged to read flag bit.The present invention is set up in internal memory
Application program dynamic registration based on access control label and cancel register mechanism, there is provided the data stream type of high scalability reads.
For stream data, the present invention is in internal memory for each stream data record increases a data access label.Data are visited
Ask that label is 32 integer numerals, each bit of integer numeral can represent an analysis program for streaming
The service condition of data.Analysis program needs to memory cache to be registered, and memory cache is its one data access mark of distribution
Know, i.e., the analysis program registered is represented using some bit in 32 integer numerals.When analysis program is registered
After success, memory cache can be the mark of one access data of its distribution, and the analysis program is exactly come convection current by the mark
Formula data conduct interviews and use.In order to reduce repetition stream data accounting for for the network bandwidth in the process of stream data
With each analysis program is unable to repeated accesses same stream data.During for data initialization in internal memory, every streaming number
According to data access mark each bit be 0.After certain application program accessed the stream data, memory cache
The data access mark of the data access flag position of this stream data and the analysis program carried out step-by-step or computing, will meter
Result after calculation is used as the current data access abstract factory of the stream data.When an application program accessed certain streaming number
According to afterwards, cannot the repeated accesses stream datas.
The step 4 includes:
After reading stream data, the access flag of the stream data is checked:
If the stream data was accessed by the analysis program of all registrations, by the stream data from memory cache layer
Remove;
Otherwise inquire about whether the stream data exceedes threshold value, the visit of analysis program is continued waiting for if not less than the threshold value
Ask, the stream data is removed from memory cache layer if more than the threshold value.
I.e. the present invention establishes efficient internal storage data cleaning and escape mechanism, the streaming number being resident in cleaning internal memory in time
According to the availability of raising data, services.For the cleaning mechanism of the stream data in internal memory, the present invention is classified as two kinds of situations
Account for.Under normal circumstances, internal storage data caching checks the access control label of stream data in internal memory, if it find that right
In all registered analysis programs, the stream data had all been used, then by log-on data scale removal process, by which from interior
Deposit middle deletion.In abnormal cases, internal storage data caching checks the access control label of stream data in internal memory, if it find that having
Some analysis programs still have not visited the stream data, then the residence time to this stream data in internal memory judges.
If the stream data is resident in internal memory for a long time, exceed the time threshold of regulation, then by log-on data scale removal process,
Which is deleted from internal memory;If residence time of the stream data in internal memory is not less than the time threshold of regulation, not right
Which is processed, and allows which to continue to be stored in internal memory.
Fig. 2 is the Online Processing System schematic diagram towards stream data of the present invention.As shown in Fig. 2 the system includes:
Online memory cache layer building module, for setting up online memory cache layer, carries out attribute to the stream data
It is stored in the online memory cache layer according to key value structure after extraction;
Hybrid index structure sets up module, for setting up hybrid index to the stream data in the memory cache layer
Structure;
Access flag builds module, increases an access flag for every stream data to establishing index structure
Position, this flag bit are used to indicate different analysis programs for the registration scenarios of the stream data;Simultaneously to each analysis program
The state for accessing stream data is recorded;
Internal memory stream data cleaning modul, for accessing to all analysis programs specified in by the memory cache layer
The stream data crossed, carries out cleaning operation.
On the basis of original framework based on central database, an online memory cache layer is increased.What is increased is interior
Deposit cache layer carries out the management of stream data based on internal memory, and externally provides reading and writing data service by network interface.Internal memory
The increase of cache layer is adjusted for the data flow of data handling system.On the one hand, capture program is by the stream for collecting
Formula data are written in memory cache, and analysis program reads stream data from memory cache, carry out data analysis.The opposing party
Face, memory cache periodically will be written to the stream data in internal memory in database and carry out persistent storage.
In online memory cache, each stream data organizes storage according to the mode of key assignments.For each streaming number
According to memory cache will distribute one globally unique No. ID key as record for which, and the key assignments corresponds to the stream data
All properties information.All of stream data is stored in key assignments mode, and by the key of stream data uniquely marking
Know a record.On the basis of based on key assignments storage, the present invention sets up many index structures of mixing for stream data, for every
The different field of bar stream data sets up different types of index structure.For the stream data of storage, some inquiries need
The inquiry of uniqueness is carried out according to attribute field, some inquiries need to be inquired about according to the interval of field.For there is uniqueness
These fields are set up hash index in internal memory by the inquiry of demand.Build using uniqueness field as the index value of hash index
Vertical hash index, when uniqueness inquiry is carried out in Hash Index Structure, can be with O's (1) (i.e. constant) under average case
Time complexity carries out the inquiry of stream data.For the attribute field for having interval query demand, to these fields in internal memory
Set up B+ trees index.The interval query carried out by B+ trees index structure can be with O (logn) (i.e. logarithm) under average case
Time complexity in complete.
The Online Processing System also includes:
Stream data exits return module, for reading after stream data, checks the access flag of the stream data:
If the stream data analyzed routine access mistake, is to have read flag bit, then the stream data is not returned
Analysis program;If the stream data does not have analyzed routine access mistake, it is not read flag bit, then by the mark of the stream data
Position is arranged to read flag bit, and returns the stream data to analysis program.The present invention is set up based on access control in internal memory
The application program dynamic registration of label processed and cancel register mechanism, there is provided the data stream type of high scalability reads.For streaming number
According to the present invention is in internal memory for each stream data record increases a data access label.Data access label is one
Individual 32 integer numerals, each bit of integer numeral can represent an analysis program for the use of stream data
Situation.Analysis program needs to memory cache to be registered, and memory cache is its one data access identities of distribution, i.e., using 32
Some bit in the integer numeral of position is representing the analysis program registered.It is after analysis program succeeds in registration, interior
The mark that caching can be one access data of its distribution is deposited, the analysis program is exactly to be visited come streaming data by the mark
Ask and use.In order to reduce duplicate data for the occupancy of the network bandwidth in the process of stream data, each analysis program is not
Can repeated accesses same stream data.During for data initialization in internal memory, the data access mark of every stream data
Each bit be 0.After certain application program accessed the stream data, memory cache is by this stream data
The data access mark of data access flag position and the analysis program carries out step-by-step or computing, using the result after calculating as this
The current data access abstract factory of stream data.After an application program accessed certain stream data, cannot be again
The stream data is accessed again.
In the internal memory stream data cleaning modul:
After analysis program reads stream data from the memory cache layer, the access flag of the stream data is checked
Position:It is if the stream data was accessed by all registered analysis programs, the stream data is clear from memory cache layer
Except the stream data;Whether the residence time for otherwise inquiring about the stream data exceedes threshold value, continues if not less than the threshold value
The stream data is removed the stream data from memory cache layer if more than the threshold value by the access of program to be analyzed.
I.e. the present invention establishes efficient internal storage data cleaning and escape mechanism, the streaming number being resident in cleaning internal memory in time
According to the availability of raising data, services.For the cleaning mechanism of the stream data in internal memory, the present invention is classified as two kinds of situations
Account for.Under normal circumstances, internal storage data caching checks the access control label of stream data in internal memory, if it find that right
In all registered analysis programs, the stream data had all been used, then by log-on data scale removal process, by which from interior
Middle deletion is deposited, the effective rate of utilization of internal memory is lifted.Under abnormal conditions, internal storage data caching checks the access of stream data in internal memory
Abstract factory, if it find that have some analysis programs still to have not visited the stream data, then to this stream data in internal memory
Residence time judged.If the stream data is resident in internal memory for a long time, exceed the time threshold of regulation, then
By log-on data scale removal process, which is deleted from internal memory;If residence time of the stream data in internal memory is not less than rule
Fixed time threshold, then do not processed to which, allows which to continue to be stored in internal memory.
Certainly, the present invention can also have other various embodiments, in the case of without departing substantially from spirit of the invention and its essence, ripe
Know those skilled in the art and various corresponding changes and deformation, but these corresponding changes and deformation can be made according to the present invention
The protection domain of the claims in the present invention should all be belonged to.
Claims (10)
1. a kind of on-line processing method towards stream data, it is characterised in that include:
Step 1, sets up online memory cache layer, institute is stored in after carrying out attribute extraction according to key value structure to the stream data
State in online memory cache layer;
Step 2, sets up hybrid index structure to the stream data in the memory cache layer;
Step 3, every stream data to establishing index structure increase an access flag, and this flag bit is used for mark not
With analysis program for the registration scenarios of the stream data, while carrying out to the state that each analysis program accesses stream data
Record;
Step 4, data scrubbing, if certain stream data by the memory cache layer in all analysis programs specified accessed,
The stream data is carried out into cleaning operation then.
2. on-line processing method as claimed in claim 1, it is characterised in that the on-line processing method also includes dynamic registration
Step:
After certain analysis program reads stream data from the memory cache layer, the access flag of the stream data is checked
Position:
If the stream data was accessed by the analysis program, it is to have read flag bit, then the stream data is not returned
The analysis program;
If the stream data was not accessed by the analysis program, it is not read flag bit, then the stream data is returned to
The analysis program, and the flag bit of the stream data is arranged to read flag bit.
3. on-line processing method as claimed in claim 1, it is characterised in that the foundation side of the key value structure in the step 1
Formula is:For each stream data, memory cache floor will distribute unique No. ID key as record for which, the key is remembered
Record the information of the stream data all properties.
4. on-line processing method as claimed in claim 1, it is characterised in that hybrid index structure is described in the step 2
Combine foundation according to key value structure, B+ trees index structure and Hash Index Structure.
5. on-line processing method as claimed in claim 1, it is characterised in that the step 2 includes:
Judge whether the stream data in the inline cache layer is needed by Field Inquiry:
If desired press Field Inquiry:If necessary to carry out interval query according to current attribute, to this Building Attribute Field B+ tree ropes
Guiding structure, if necessary to carry out major key inquiry according to current attribute, then to this Building Attribute Field Hash Index Structure;
If need not be by Field Inquiry, need not be to this Building Attribute Field index structure.
6. on-line processing method as claimed in claim 1, it is characterised in that in the step 3:The access flag is one
Individual 32 integer numerals, each bit of each integer numeral can represent an analysis program for stream data
Access state, when the stream data in internal memory is initialized, each bit of the access flag of every stream data
It is 0;
When analysis program is registered to internal memory cache layer, the memory cache layer is its one access flag of distribution, when
After certain analysis program accesses a stream data, the memory cache layer is by the access flag of the stream data and this point
The access identities of analysis program carry out digitwise operation, and using the result after calculating as the current access flag of the stream data.
7. on-line processing method as claimed in claim 1, it is characterised in that in the step 4:
After reading stream data, the access flag of the stream data is checked:
It is if the stream data was accessed by the analysis program of all registrations, the stream data is clear from memory cache layer
Remove;
Whether the residence time for otherwise inquiring about the stream data exceedes threshold value, if not less than the threshold value continues waiting for analysis program
Access, if more than the stream data being removed from memory cache layer if the threshold value.
8. a kind of Online Processing System towards stream data, it is characterised in that include:
Online memory cache layer building module, for setting up online memory cache layer, carries out attribute extraction to the stream data
It is stored in the online memory cache layer according to key value structure afterwards;
Hybrid index structure sets up module, for setting up hybrid index knot in the memory cache layer to the stream data
Structure;
Access flag builds module, increases an access flag for every stream data to establishing index structure,
This flag bit is used to indicate different analysis programs for the registration scenarios of the stream data, while accessing to each analysis program
The state of stream data is recorded;
Internal memory stream data cleaning modul, for what is accessed to all analysis programs specified in by the memory cache layer
Stream data, carries out cleaning operation.
9. Online Processing System as claimed in claim 8, it is characterised in that the Online Processing System also includes:
Stream data exits return module, for reading after stream data, checks the access flag of the stream data:If
The stream data analyzed routine access mistake, is to have read flag bit, then the stream data is not returned analysis program;If
The stream data does not have analyzed routine access mistake, is not read flag bit, then be arranged to read by the flag bit of the stream data
Flag bit, and the stream data is returned to analysis program.
10. Online Processing System as claimed in claim 8, it is characterised in that in the internal memory stream data cleaning modul:
After analysis program reads stream data from the memory cache layer, the access flag of the stream data is checked:
If the stream data was accessed by all registered analysis programs, the stream data is removed from memory cache layer;
Whether the residence time for otherwise inquiring about the stream data exceedes threshold value, and the visit of analysis program is continued waiting for if not less than the threshold value
Ask, the stream data is removed from memory cache layer if more than the threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210510056.2A CN103853766B (en) | 2012-12-03 | 2012-12-03 | A kind of on-line processing method and system towards stream data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210510056.2A CN103853766B (en) | 2012-12-03 | 2012-12-03 | A kind of on-line processing method and system towards stream data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853766A CN103853766A (en) | 2014-06-11 |
CN103853766B true CN103853766B (en) | 2017-04-05 |
Family
ID=50861433
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210510056.2A Active CN103853766B (en) | 2012-12-03 | 2012-12-03 | A kind of on-line processing method and system towards stream data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853766B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572973A (en) * | 2014-12-31 | 2015-04-29 | 上海格尔软件股份有限公司 | High-performance memory caching system and method |
CN104657467B (en) * | 2015-02-11 | 2017-09-05 | 南京国电南自维美德自动化有限公司 | A kind of data-pushing framework with subscription/publication of real-time internal memory database |
CN105242971B (en) * | 2015-10-20 | 2019-02-22 | 北京航空航天大学 | Memory object management method and system towards Stream Processing system |
CN106911589B (en) | 2015-12-22 | 2020-04-24 | 阿里巴巴集团控股有限公司 | Data processing method and equipment |
CN106506254B (en) * | 2016-09-20 | 2019-04-16 | 北京理工大学 | A kind of bottleneck node detection method of extensive stream data processing system |
CN106959928B (en) * | 2017-03-23 | 2019-08-13 | 华中科技大学 | A kind of stream data real-time processing method and system based on multi-level buffer structure |
CN110120959B (en) * | 2018-02-05 | 2023-04-07 | 北京京东尚科信息技术有限公司 | Big data pushing method, device, system, equipment and readable storage medium |
CN110609707B (en) * | 2018-06-14 | 2021-11-02 | 北京嘀嘀无限科技发展有限公司 | Online data processing system generation method, device and equipment |
CN110532072A (en) * | 2019-07-24 | 2019-12-03 | 中国科学院计算技术研究所 | Distributive type data processing method and system based on Mach |
CN110532263A (en) * | 2019-08-08 | 2019-12-03 | 杭州广立微电子有限公司 | A kind of integrated circuit test system and its data base management system towards column |
CN110990059B (en) * | 2019-11-28 | 2021-11-19 | 中国科学院计算技术研究所 | Stream type calculation engine operation method and system for tilt data |
CN112035528B (en) * | 2020-09-11 | 2024-04-16 | 中国银行股份有限公司 | Data query method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495838A (en) * | 2011-11-03 | 2012-06-13 | 成都市华为赛门铁克科技有限公司 | Data processing method and data processing device |
CN102542057A (en) * | 2011-12-29 | 2012-07-04 | 北京大学 | High dimension data index structure design method based on solid state hard disk |
CN102567434A (en) * | 2010-12-31 | 2012-07-11 | 百度在线网络技术(北京)有限公司 | Data block processing method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7689602B1 (en) * | 2005-07-20 | 2010-03-30 | Bakbone Software, Inc. | Method of creating hierarchical indices for a distributed object system |
-
2012
- 2012-12-03 CN CN201210510056.2A patent/CN103853766B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567434A (en) * | 2010-12-31 | 2012-07-11 | 百度在线网络技术(北京)有限公司 | Data block processing method |
CN102495838A (en) * | 2011-11-03 | 2012-06-13 | 成都市华为赛门铁克科技有限公司 | Data processing method and data processing device |
CN102542057A (en) * | 2011-12-29 | 2012-07-04 | 北京大学 | High dimension data index structure design method based on solid state hard disk |
Non-Patent Citations (2)
Title |
---|
流式数据库系统的研究与设计;张玲东;《中国优秀硕士学位论文全文数据库信息科技辑》;20050915(第05期);全文 * |
流式数据挖掘的现状及统计学的研究趋势;朱建平等;《统计研究》;20070731;第24卷(第7期);第84-87页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103853766A (en) | 2014-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103853766B (en) | A kind of on-line processing method and system towards stream data | |
US20190138221A1 (en) | Method and Apparatus for SSD Storage Access | |
CN102843396B (en) | Data write-in and read method and device in a kind of distributed cache system | |
US8397293B2 (en) | Suspicious node detection and recovery in mapreduce computing | |
Zhan et al. | A loan application fraud detection method based on knowledge graph and neural network | |
CN103198361B (en) | Based on the XACML strategy evaluation engine system of multiple Optimization Mechanism | |
CN109165096B (en) | Cache utilization system and method for web cluster | |
CN113535677B (en) | Data analysis query management method, device, computer equipment and storage medium | |
CN107633045A (en) | The statistical method and its system of tenant data capacity in a kind of cloud storage service | |
Jain et al. | Refreshing datawarehouse in near real-time | |
US7895247B2 (en) | Tracking space usage in a database | |
US20120310918A1 (en) | Unique join data caching method | |
CN107577787A (en) | The method and system of associated data information storage | |
CN109446167A (en) | A kind of storage of daily record data, extracting method and device | |
WO2023278975A1 (en) | Making decisions for placing data in a multi-tenant cache | |
Cremonezi et al. | Improving the attribute retrieval on ABAC using opportunistic caches for fog-based IoT networks | |
CN116661685A (en) | Hierarchical storage method and system for object storage metadata of business behavior awareness | |
CN112817982B (en) | Dynamic power law graph storage method based on LSM tree | |
Li | [Retracted] Research on the Social Security and Elderly Care System under the Background of Big Data | |
CN111147575B (en) | Data storage system based on block chain | |
CN111767344A (en) | Novel alliance chain for improving data processing capacity | |
CN106027685A (en) | Peak access method based on cloud computation system | |
CN108062311A (en) | A kind of method and system of access service device web data | |
CN105653621A (en) | Uninterrupted business system, data export method thereof and streaming data service module | |
CN112596955B (en) | Emergency processing system and method for processing large-scale system emergency in cloud computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20140611 Assignee: Branch DNT data Polytron Technologies Inc Assignor: Institute of Computing Technology, Chinese Academy of Sciences Contract record no.: 2018110000033 Denomination of invention: Online processing method and system oriented to streamed data Granted publication date: 20170405 License type: Common License Record date: 20180807 |