CN108140047A - Data processing equipment and method and data capsule structure - Google Patents

Data processing equipment and method and data capsule structure Download PDF

Info

Publication number
CN108140047A
CN108140047A CN201680059351.7A CN201680059351A CN108140047A CN 108140047 A CN108140047 A CN 108140047A CN 201680059351 A CN201680059351 A CN 201680059351A CN 108140047 A CN108140047 A CN 108140047A
Authority
CN
China
Prior art keywords
data
capsule structure
subset
window
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201680059351.7A
Other languages
Chinese (zh)
Other versions
CN108140047B (en
Inventor
拉杜·图多兰
戈兹·布兰切
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN108140047A publication Critical patent/CN108140047A/en
Application granted granted Critical
Publication of CN108140047B publication Critical patent/CN108140047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of data processing equipment and method and a kind of corresponding data structure of container.For the data processing equipment (300) for handling data flow (301), the data flow includes the multiple data elements (301a, b) arranged in chronological order.The data processing equipment (300) includes processor (303), multiple data capsule structure (305a are generated for being based on the data flow (301), b), wherein each data capsule structure (305a, b) include the multiple data element (301a of the chronological data flow (301), b) subset, wherein described processor (303) is further used for each data capsule structure (305a, b) metadata (307a is provided, b), wherein described metadata (307a, b) each data capsule structure (305a is defined, b) relative to the multiple data capsule structure (305a, b) other data capsule structure (305a in, b) the time sequencing.

Description

Data processing equipment and method and data capsule structure
Technical field
In general, the present invention relates to the fields of data processing.More specifically, the present invention relates to one kind for handling number Data processing equipment and method and a kind of corresponding data structure of container according to stream.
Background technology
In current informative environment, the quick mass data that handles may be challenging and extremely important. When event occurs, this data is usually with data flow, that is, continuously or semi-continuously the form of data flow provides, wherein in many situations Under, data element is generated in real time.For example, it is used for radio frequency identification (radio-frequency when tracking and accessing application Identification, RFID) in sensor the stream data of the position about tracked target can be provided.Convection type number Signal specific in quickly makes a response the critical aspects of typically multiple applications.For example, for detecting the net of security threat Network monitoring system needs to detect and be reported in by monitoring the event represented in the data flow being collected into.
Conventionally, the processing of streaming data is performed by storing data in first in database.It can then inquire Database supplies to be further processed to retrieve data.Therefore, analysis data are extremely difficult in real time, because of database access number It is restricted, specifically, the stream with high data rate is restricted.In order to solve the problems, such as this, redesign several traditional Software technology, for example, main storage database manages system.
The referred to as technology of " Complex event processing " or " event stream processing " has been developed in recent years.It, can by means of these technologies To detect the event of the significant pattern in data flow of being presented as due to the processing of data flow.In this context, occurred In the architecture of the new category of stream process engine form, for example, " Aurora ", " STREAM ", " TelegraphCQ " are with specific For support high power capacity, low time delay Data Stream Processing application.
Conventional stream process example is related to the one group of operation applied in all elements/event of data flow.In many feelings Under shape, processing data flow is related to must be based on the consecutive variations data element of data flow or event repeats the spy calculated in time Devise a stratagem is calculated, for example, calculating sliding average.In other words, the new data element or event for being received as a part for data flow lead to Chang Bixu is objectively treated as the early time data stream in data flow.Alternatively, in certain application scenarios, each new data element base All data available elements in data flow, that is, early time data element and new data element triggering compute repeatedly.
This situation is usually directed to the different windows that data flow is divided into computation, that is, time interval.Window is opposite In the time defined by the data element or event of data flow or the demarcation of sequence, the data flow contains data element or event Subset.This concept is illustrated in Fig. 1.Conventional stream process engine is using window come to data flow or event application processing function (for example, calculating operation).If real-time reception event or if event is stored in database and is imitated in recurrence event Processing is repeated during Flow Behavior, uses same mechanism.
Particular aspects are that conventional window is transferable.For by calculating operation be applied to data flow data element or The purpose of event forms window, but after calculating operation is completed, does not preserve window.Because the processing that generally can give up and flow Corresponding window, so any continuous operation in data flow will need to regenerate and recalculate window.Even for window The smaller update of mouth, for example, time recalibration, can also occur this behavior.It actually considers for multiple analysis situations, relatively Completion flows to the segmentation of multiple windows at the time of analysis, therefore this is general operation.With time stepping method, during the reference of system Between also advance and this need to readjust window.
Illustrate the example scenario that may occur in conventional stream process engine in Fig. 2.For example, at time point X, data The data element or event of stream can be grouped as window, since current time and trace 6 months, the window is small with 1 When size.Based on data element or event to this grouping of window, different statistical measurements can be calculated, for example, the number of window According to element or the average value of event.It is reached in the part of new data element or event as data flow at time point X+ Δ Afterwards, it is necessary to these statistical measurements are calculated again, because must be regenerated from the beginning for time point X+ Δ window.
In view of the foregoing, a kind of improved data processing equipment and method and a kind of improved data capsule knot are needed Structure, specifically, this allows to utilize the time structure of the data element in data flow in an improved way.
Invention content
It is an object of the present invention to provide a kind of improved data processing equipment and method and a kind of improved data capsules Structure, specifically, this allows to utilize the time structure of the data element in data flow in an improved way.
Aforementioned and other targets are realized by the subject matter of independent claims.Other forms of implementation will by appurtenance It asks, describe and schema is clearly visible.
According in a first aspect, the present invention relates to a kind of for handling the data processing equipment of data flow, the data stream packets Include the multiple data elements being arranged in chronological order in data flow.Data processing equipment includes being used for generating based on data flow more The processor of a data structure of container, wherein each data capsule structure includes multiple data in chronological data flow The subset of element.Processor is further used for providing metadata to each data capsule structure, and wherein metadata defines every number According to structure of container relative to the time sequencing of other data capsule structures in multiple data capsule structures, that is, each data are held The time sequencing of device structure in a stream.
Therefore it provides a kind of improved data processing equipment, specifically, this allows to utilize the data element in data flow Time structure.Based on metadata, the time-based management of the data capsule structure by data processing equipment is provided.It is carried The performance advantage of confession is to save to be recycled by updating the calculating that stream operation occurs again of triggering.The calculating internal circulating load saved Change depending on specific operation and context, so as to reach the general option more than 90% in some cases.
In the first possible form of implementation of the data processing equipment according to first aspect in itself, each data capsule structure Including at least one data window and/or the sub- container of at least one data, at least one data window includes multiple data At least one of subset of element data element, at least one sub- container of data include at least two data sub-windows, Wherein each data sub-window includes at least one of the subset of multiple data elements data element.
For example, providing the sub- container of data capsule structure, data of this level, data window and/or data sub-window allows Utilize the different time granularity of different levels.
According to first aspect in itself or the data processing equipment of its first form of implementation second may in form of implementation, The metadata of each data capsule structure includes the pointer of the data element in the subset of multiple data elements, and the pointer is more The first data element in the time sequencing of the subset of a data element.In form of implementation, the member of each data capsule structure At the beginning of data may further include data capsule structural identifier and/or data capsule structure or the end time.
According to first aspect in itself or the third of the data processing equipment of its first or second form of implementation may be implemented In form, processor is further used for providing the son with multiple data elements of data capsule structure to each data capsule structure Collect associated statistical data.In form of implementation, statistical data can include at least one of following item:Multiple data elements Counting, summation, average value, intermediate value, variance, standard deviation and/or the coefficient of variation of element.It is multiple right that statistical data can include, In each data element or the statistical function of its window and the value of statistical function to including being applied in data capsule structure.
In the 4th possible form of implementation of the data processing equipment of the third form of implementation according to first aspect, per number Further comprise statistical number associated with the subset of multiple data elements of data capsule structure according to the metadata of structure of container According to pointer.
Of the data processing equipment of any one according to first aspect in itself or in its first to fourth form of implementation In five possible forms of implementation, processor is further used for the number by the way that the data element of data flow to be added to data capsule structure Subset according to element or the subset by the data element from data capsule the structure member that clears data usually adjust each data and hold Device structure.In form of implementation, when updating or adjusting data capsule structure, processor is used to be directed to and data capsule structure The associated statistical data of data element, which utilizes, postpones update scheme or active update scheme.
Of the data processing equipment of any one according to first aspect in itself or in its first to the 5th form of implementation In six possible forms of implementation, data processing equipment further comprises the memory for storing multiple data capsule structures.
According to second aspect, the present invention relates to data capsule structure, for the multiple data elements that will be arranged in chronological order Element storage is in a stream.Data capsule structure includes the subset of multiple data elements in chronological data flow, with And define the metadata of the time sequencing of data capsule structure in data flow, that is, it is opposite that data capsule structure is based on data flow In the time sequencing of other structure of container.
In the first possible form of implementation of the data capsule structure according to second aspect in itself, data capsule structure includes At least one data window and/or the sub- container of at least one data, at least one data window include multiple data elements At least one of subset data element, at least one sub- container of data includes at least two data sub-windows, wherein Each data sub-window includes at least one of the subset of multiple data elements data element.
According to second aspect in itself or the data capsule structure of its first form of implementation second may in form of implementation, The metadata of data capsule structure includes the pointer of the data element in the subset of multiple data elements, and the pointer is multiple numbers According to the first data element in the time sequencing of the subset of element.
According to second aspect in itself or the third of the data capsule structure of its first or second form of implementation may be implemented In form, data capsule structure further comprises statistical number associated with the subset of multiple data elements of data capsule structure According to.
In the 4th possible form of implementation of the data capsule structure of the third form of implementation according to second aspect, data are held The metadata of device structure includes the pointer of statistical data associated with the subset of multiple data elements of data capsule structure.
Of the data capsule structure of any one according to second aspect in itself or in its first to fourth form of implementation In five possible forms of implementation, data capsule structure is used for the number by the way that the data element of data flow to be added to data capsule structure Subset according to element or the subset by the data element from data capsule structure clear data element and adjustable.
According to the third aspect, the present invention relates to a kind of for handling the data processing method of data flow, the data stream packets Include the multiple data elements arranged in chronological order.Data processing method includes the following steps:Multiple numbers are generated based on data flow According to structure of container, wherein each data capsule structure includes the subset of multiple data elements of chronological data flow;With And metadata is provided to each data capsule structure, wherein metadata defines each data capsule structure and holds relative to multiple data The time sequencing of other data capsule structures of device structure.
Data processing method according to the third aspect of the invention we can pass through data according to the first aspect of the invention Processing unit performs.The other feature of data processing method according to the third aspect of the invention we is directly by according to the present invention On the one hand and the functional of data processing equipment of its above-mentioned different form of implementation generates.
According to fourth aspect, the present invention relates to a kind of computer program, including for being performed when running on computers The program code of the data processing method of any one according to the third aspect of the invention we or in its form of implementation.
The present invention can be implemented in hardware and/or software.
Description of the drawings
Other embodiments of the invention will be described relative to the following drawings, wherein:
Fig. 1 shows the schematic diagram for illustrating the aspect of conventional stream process engine;
Fig. 2 shows the schematic diagrames for the aspect for illustrating conventional stream process engine;
Fig. 3 shows the schematic diagram for illustrating the data processing equipment according to the embodiment for being used to handle data flow;
Fig. 4 shows the schematic diagram for illustrating the data processing method according to the embodiment for being used to handle data flow;
Fig. 5 shows explanation according to the multiple data capsule knots of embodiment provided by data processing equipment according to the embodiment The schematic diagram of structure;
Fig. 6 shows explanation according to the data capsule structure of embodiment provided by data processing equipment according to the embodiment The schematic diagram of different aspect;
Fig. 7 shows explanation according to the data capsule structure of embodiment provided by data processing equipment according to the embodiment The schematic diagram of different aspect;
Fig. 8 shows to illustrate data processing equipment according to the embodiment and multiple data capsule structures according to the embodiment not With the schematic diagram of aspect;And
Fig. 9 shows the table for illustrating the performance of data processing equipment according to the embodiment and method.
Specific embodiment
It is described referring to the attached drawing below, the attached drawing forms the part of the present invention, and by way of diagram illustrating It shows that specific aspect of the invention can be implemented.It should be understood that without departing from the present invention, other sides can be utilized Face, and change in structure or in logic can be made.Therefore, detailed description below is not regarded as in a limiting sense, because It is defined by the following claims for the scope of the present invention.
For example, it should be appreciated that the disclosure with reference to described method can be equally applicable to perform the method Corresponding equipment or system, and vice versa.If for example, description specified method steps, corresponding device can include and perform institute The unit of the method and step of description, even if this element is not expressly recited or illustrates in figure.Furthermore, it is to be understood that unless in addition have Body points out that otherwise the feature of various illustrative aspects described herein can be combined with each other.
Fig. 3 shows the schematic diagram for illustrating the data processing equipment 300 according to the embodiment for being used to handle data flow 301.Number It is formed according to 301 multiple data element 301a, b by being arranged in chronological order in data flow 301 of stream, for example, being attributed to data The arrival time of respective data element 301a, b at processing unit 300.
Data processing equipment 300 includes processor 301, and multiple data capsule structures are generated for being based on data flow 301 305a, b, wherein multiple data element 301as of each data capsule structure 305a, b including chronological data flow 301, The subset of b.Memory 309 can be provided as a part for data processing equipment 300, for storing multiple data capsule structures 305a、b。
Processor 301 is further used for providing metadata 307a, b to each data capsule structure 305a, b, wherein first number Each data capsule structure 305a, b is defined according to 307a, b relative to other data in multiple data capsule structures 305a, b to hold The time sequencing of device structure 305a, b, that is, each time sequencing of data capsule structure 305a, b in data flow 301.For example, In figure 3 in shown embodiment, the metadata 307b of data capsule structure 305b can define data capsule structure 305b phases For the time sequencing of data capsule structure 305a.
Therefore, data capsule structure 305a, b shown in Fig. 3 be used for by arrange in chronological order data element 301a, B is stored in data flow 301.Each data capsule structure 305a, b includes multiple numbers in chronological data flow 301 According to element 301a, b subset and define data capsule structure 305a, b in data flow 301 time sequencing metadata 307a, b, that is, data capsule structure 305a, b is based on data flow 301 relative to the time sequencing of other structure of container 305a, b.
Fig. 4 shows according to the embodiment for handling data flow, such as the data processing of the data flow 301 shown in Fig. 3 The schematic diagram of method 400, wherein data flow 301 include multiple data element 301a, b for arranging in chronological order.
Data processing method 400 includes the step 401 that multiple data capsule structures 305a, b are generated based on data flow 301, Wherein each data capsule structure 305a, b includes the son of multiple data element 301a, b in chronological data flow 301 Collection and the step 403 that metadata 307a, b is provided to each data capsule structure 305a, b, wherein metadata 307a, b define Each data capsule structure 305a, b is relative to other data capsule structures 305a, b in multiple data capsule structures 305a, b Time sequencing.
Other realities of data processing equipment 300, data capsule structure 305a, b and data processing method 400 are described below Apply form, embodiment and aspect.It is as already described above, the present invention provide it is a kind of allow the data element of data flow 301 or For the data capsule structure of storage in event organization to window, for example, data capsule structure 305a, b shown in Fig. 3.This Construction packages will preserve this subregion in data element or event packets to data capsule structure or window and when storing data Concept.In embodiment, hold as the internal reference for delimiting, managing, accessing and navigating, data processing equipment and data Device structure uses time-based reference.In embodiment, event or data element are completed in the window by layered mode Tissue and preservation.That is, in embodiment, preserve the data capsule of window by by correspond to data smaller particle size/point The sub- container composition in area.In embodiment, these granularities can drop to event or the level of data element.In embodiment, number Can have the time boundary and navigational reference as delimiter according to structure of container.In addition, it in embodiment, forms data and holds The internal structure of device structure can also have the correspondence time reference of its own.In embodiment, data processing equipment 300 is used for Multiple operations are applied in data capsule structure 305a, b.This purpose is the segmentation or the system that calculates thereon in event to window It counts to save when needing update and calculates cycle.By preserving the window subregion of stream, when needing application update, window can be only It is readjusted according to specific operation rather than is regenerated from initial data.In addition, the tissue realization group of data in the window Window is closed to obtain the new subregion of data or readjust window time boundary to delete, add and be inserted into new events or container In meaning, the tissue of the data in window is flexible.In order to realize this target later, the data capsule structure of data is preserved It can be extendible and allow internal change.In addition to this, data capsule structure can be recorded about the predefined of window Statistical data, the metadata link of the statistical data and the navigation of the corresponding container of realization.In embodiment, these statistical data It can be preserved and be updated when generating new update.Allow what quick access was calculated generally about flow of event in this way Share composite signal.Several aspects in the above are hereafter further more fully described.
In the embodiment of data processing equipment 300, each data capsule structure 305a, b can include at least one number According to window and/or the sub- container of at least one data, at least one data window is included in the subset of multiple data elements At least one data element, at least one sub- container of data includes at least two data sub-windows, wherein each data are sub Window includes at least one of the subset of multiple data elements data element.Hereafter retouched in more detail in the context of Fig. 5 This embodiment is stated, Fig. 5 shows the layered structure of multiple data capsule structure 305a-c according to the embodiment.
Data capsule structure 305a-c shown in Fig. 5 can preserve other containers (the herein referred as sub- container of data) or more A event or data element.Therefore each upper strata in level will correspond to window corresponding with specific time size granularity.This Construction packages realize the time reference and metamessage to navigate based on the time by data.
Therefore, in embodiment, metadata 307a, b of each data capsule structure 305a, b can include multiple data The pointer of data element in the subset of element, the pointer are the first numbers in the time sequencing of the subset of multiple data elements According to element.In form of implementation, the metadata of each data capsule structure may further include data capsule structural identifier And/or at the beginning of data capsule structure or the end time.
Therefore, the data capsule structure 305a-c shown in Fig. 5 can be preserved based on the raw any kind of subregion of miscarriage, It is described to flow the data flow 301 for example, shown in Fig. 3.In embodiment, about window, time granularity tissue and required navigation letter This metadata collecting of breath is in metadata 307a.
Other than metadata 307a, can also store the information about certain statistical data based on window calculation for Quick-searching.Therefore, in embodiment, processor 303 is further used for providing and phase to each data capsule structure 305a-c Answer the associated statistical data 311a of subset of multiple data elements of data capsule structure 305a-c.For clarity, in Fig. 5 In only designation date structure of container 305 statistical data 311a.
In embodiment, statistical data 311a can include at least one of following item:The counting of multiple data elements, Summation, average value, intermediate value, variance, standard deviation and/or the coefficient of variation.It is multiple right that statistical data 311a can include, wherein each To the data element or the statistical function of its window and the value of statistical function for including being applied in data capsule structure.
In embodiment, the metadata 307a of data capsule structure 305a further comprises with data capsule structure 305a's The pointer of the associated statistical data 311a of subset of multiple data elements.
At each layer in structural level, it can be controlled, extended and be readjusted.Only by not actual access Data in itself in the case of readjust metadata information support these operate in some operation.In addition, pinpoint target is Corresponding data structure of container can be controlled under any granularity, can reuse data capsule structure when needing update in this way, The update is for example, the recalibration of window, event, insertion/removal of data element or window etc..In addition, with for loose The mechanism of ground link window can perform the operation based on window, for example, window synthesis or decompose to obtain other time grain Degree.
Therefore, in embodiment, processor 303 is further used for by the way that the data element of data flow 301 is added to number It is clear according to the subset of the data element of structure of container 305a-c or by the subset of the data element from data capsule structure 305a-c Each data capsule structure 305a-c is adjusted except data element.
Fig. 6 shows the metadata 307a of data capsule structure 305a according to the embodiment.It is as already described above, metadata The time subregion for being mainly used for preserving the stream about event of 307a, that is, the information about each window defining in itself.Separately Outside, metadata 307a can be included to preserving the container of corresponding data and the ginseng of statistical data associated there in window It examines.In order to preserve this information, each element in metadata 307a can be structured as record (that is, n liaisons), wherein in record Each field preserves this certain types of information:Window ID, time started, end time, the pointer of vessel head, container end portion Pointer, the pointer of statistical data and the reference ID of package window.Multiple forms may be used in the tissue recorded when implementing, For example, Hash table or tree.
In addition, Fig. 6 shows illustrative statistical data 311a associated with the data element of data capsule structure 305a.Such as Fruit other than being directed to each window (or closed window) by counting statistics data, type function and functional value to list can To save as a part of statistical data 311a.When new statistical function is applied on the window of data flow, the statistics letter Several types and resulting value can be recorded in statistical data 311a.The example of this statistical function is:Counting, summation, average value, Intermediate value, variance, standard deviation, coefficient of variation etc..Each window (crossing over used different grain size layer) can have its own To this list.It, can be with such as in the case of metadata 307a of the range to take a different form in Hash table to tree or list The set of all these lists of tissue pair.Key point is to must be allowed in reference statistical data 311a (that is, to list) Each single item so that the pointer of the item can be stored in metadata 307a.
When update (for example, recalibration, new projects, deletion ...) will be applied to data capsule structure, two options can For statistical data 311a:Delay update actively updates.The update that use and described can be set when generating structure Update can be updated during its life cycle.Deferred evaluation will be implied when any operation is applied to the data capsule knot On structure when (or in a data capsule structure), label that dirty mark will be set to not update corresponding statistical data 311a.Work as record When a function in function in statistical data 311a is recalculated in the future, record update and dirty mark are set Into removing.In the case of actively more new option, each update applied to data capsule structure or window will trigger all systems The new calculating of evaluation.By this method, it is immediately available in the future about the correct statistical information of the window in data flow.
Data capsule structure is responsible for being stored in event in the stream being organized in window.Data capsule structure will include certain Event between boundary.It usually delimits boundary and refers to the time.When data capsule structure can preserve window, time boundary will refer to The time of window delimits.When dividing stream in multiple windows, this each window will be recorded in different vessels, and thus preserving should Time delimits.These containers can be linked to each other based on the identity logic sequence that inlet flow has.For defining window Time granularity can be a time granularity being used when completing subregion or in the operation in application data capsule structure Setting, for example, combination or window the readjusting to new time granularity of different data structure of container.
In embodiment, data processing equipment 300 is used to provide the time-based management of data.For this purpose, data capsule The start and end time of structure delimits and is not only time reference.In data capsule structure, can have and correspond to more finely The time reference of granularity.When generating structure or when application changes the operation of the time tissue in structure, this internal grain is set Degree.If being explained further above, in this case, layered structure is obtained.Therefore container can be based in the progress of sub- container Portion organizes, this will be divided corresponding to the time of the time interval of packaging container.Under minimum granularity, sub- container will only preserve data Element or event.Fig. 7 shows the level of the data capsule structure according to the embodiment with example hierarchical depth 3.However, base Specific needs and the continuous operation applied in situation, can generate larger level using same principle.
Therefore, in embodiment, container can be formed as the set of other interconnection containers (that is, sub- container) or be formed as thing The set of part or data element.Context structure of container in itself can be interconnected amongst one another, so as to capture the subregion of stream.Each data Structure of container can correspond in metadata 307a and optionally and have corresponding record in statistical data 311a.In embodiment, It is really not so for event or data element, because this can generate a large amount of metadata records.Under any granularity, data capsule knot Structure is only by the time boundary in mobile metadata 307a and by updating the chain for influencing subset in sub- container or event Road is supported to extend or is compressed.From the perspective of embodiment, the time sequencing thing that pass through fixed size of element can be used The double liked list of part or data element obtains the above-mentioned functionality of data capsule structure.
As already described above, the main target of the embodiment of the present invention is to provide a kind of purpose for increase performance Save the mechanism of the subregion of the stream in structure.This target is realized by reusing existing calculate.For this purpose, proposed lattice should be enabled Multiple operations of formula in generated data capsule structure to perform time-based operation.Furthermore it is also possible to pass through realization It is directly operated in data capsule structure under metadata rank, for example, the statistical data that retrieval precomputes is next acquired It can improve.Finally, another advantage brought be under isolation mode or across other similar structures management be in window form Data capsule structure option.Pass through this each functionality of specific operation exposure.The example of this generic operation is shown in Fig. 8. Hereinafter, some exemplary operations from each classification described in the context of Fig. 8.
As already described above, the embodiment of the present invention allows to avoid point for recalculating data flow when update is available Area.This update is related to insertion/addition of the new events or data element in stream, deletion, update, adds.It can be by first number According to rank under generate needed for adjust and supported by performing event or window movement under the rank of data capsule structure These operations.For example, under the rank of metadata, modification can imply that the time boundary of data capsule structure adjustment and can Energy ground is used for the reference of the packaging container of an a little container.Under the rank of data capsule structure, operation can be related to data A part of data in structure of container are moved to adjacent data structure of container.As an optimization, it can use to event or data element The access based on the fine granularity time of element is read and is subsequently written on the storage rear end for preserving data capsule structure to reduce Data volume, for example, the memory 309 shown in Fig. 3.
In addition, the embodiment of the present invention allows to carry out time control to storage data.For this purpose, data capsule structure can profit With the time reference of partition data, the time reference can be used for fine granularity navigation.However, the window of partition data is with quiet State time dimension.Data capsule structural support according to the present invention is for changing this time subregion or will be grouped as larger the time Every operation.It is generally used for data capsule structure or window being divided into sub- interval for changing the operation of time subregion.This will be dark Show and generate sub- container, wherein will be grouped when data element will be moved into the sub- container to event.Although similar machine Structure can be used for, by data reconstruction to new time subregion, generating corresponding to the new subregion flowed however, it is understood that this is equivalent to Completely new data capsule structure.Alternatively, when being related to grouping of the window to larger space, this can be by generating new container come complete Into in the new container, existing window will be grouped.This also implies the corresponding field that will generate metadata.This is weight It operates, because the operation allows to obtain the high-level view abstract flowed.
As previously mentioned, this statistical data allows record to be applied to the mathematical function flowed and operator and records institute State the value of mathematical function and operator.It is achieved and quickly accesses this data precalculated.It will be under the rank of statistical operation The operation of enabling be read described value, the value precalculated of search function, record/update a pair of type function and result and Whether effectively (specifically, in the case of deferred evaluation) to mark the result.
The one group of operation enabled is using the function of a data structure of container in another one.This function include cascade, Merge, differentiate, compare.Depending on specific function, it (can be source structure that the container of source structure, which will be copied to resulting structures, In a structure) in and linked according to function logic.This will also imply the update of the rank of metadata and statistical data.
In order to protrude some advantages during the embodiment of the present invention provides the advantage that, will hereafter be retouched in the context of Fig. 8 State example scenario.In this example, the corresponding flat of bank account is calculated within every 5 minutes in each hour interval last 6 months Mean value.This is an example for preventing to handle needed for fraud analysis (for credit card funded payment, the money laundering of prevention bank etc.).Often The value that a window calculation goes out is used as the ground connection reference value for verifying New Transaction.This analysis is general across multiple commercial fields, institute Therefore stating analysis can be optimized by optimizing this solution of basic handling operation.
Conventionally, situation above will be handled in the following manner.All events are stored in table.When triggering is new to be calculated, Stream process engine will start reads each event to start from a nearest event.It will be recalculated in the event of reading all 1 hour window.Recalculate the average value in each window.Expenses of these operations are wherein N=event numbers:O (N) from Disk read operation, the operation of O (N) "+" and the operation of 4392 (6*30.5*24) "/".
According to an embodiment of the invention, situation above can be handled in the following manner.All events were with the inside of 5 minutes Time reference is stored and is formatted with respect to the data capsule structure of embodiment, and the data capsule structure has 1 hour Window container.It reads the metadata of each window container and for each container, is completed in a series of 5 minutes tables clear Remove and add operation.Expense is:The number read from disk is reduced to 8% and (only reads 1/125 point in each 1 hour window Clock time table) and be subsequently written (depending on embodiment, if all data are stored in identical file simultaneously compared with reading And write-in can it is only fully erased by the update in metadata table and replace).According to the event number in the 5 minutes windows in boundary Reduce the number (all data can be considered as 8%) of "+" operation.
For example, the embodiment of the present invention provides advantages below.It provides and realizes that the time of the storage data from data flow is general The device and method of thought.It provides for the data capsule structure in the flow point area in data window, the data capsule structure allows The calculating loop optimization of convection current is performed by memory.It provides that the discrete newer data capsule structure on stream is effectively treated. Data capsule structure for the control based on window is provided.The data for providing the support hierarchical format for flow of tissue event are held Device structure.Provide support for the data capsule structure of event functions and statistics.
Although may only in conjunction with one in several embodiments or embodiment disclose particularly unique feature of the present invention or Aspect, but such features or aspect can be with one or more of other embodiments or embodiment other feature or aspect group It closes, as long as being in need or advantageous for any given or specific application.In addition, to a certain extent, term "comprising", Other deformations of " having ", " having " or these words use in detailed description or claims, this kind of term and term " comprising " is similar, is all the meaning for representing to include.In addition, term " exemplary ", " for example " and " such as " only represent For example rather than best or best.Term " coupling " and " connection " and its derivative can be used.It should be understood that these Term may be used to indicate two elements and cooperate or interact with, and be that directly physically or electrically gas contacts but regardless of it, still each other not It is in direct contact.
Although particular aspects have been illustrated and described herein, one of ordinary skill in the art will be appreciated that a variety of replacements And/or the alternative shown and described particular aspects of equivalent embodiments are without departing from the scope of the present invention.Present application purport In any modification or variation for covering particular aspects discussed herein.
Although each element in claims below is enumerated by corresponding label according to particular order, unless right The elaboration of claim separately has the particular order that hint is used to implement these some or all elements, otherwise these elements and differs Fixed limit is implemented in the particular order.
By teachings above, to those skilled in the art, many alternative solutions, modification and variant are apparent 's.Certainly, it will be readily recognized by one of average skill in the art that in addition to application as described herein, also exist the present invention it is numerous its It is applied.Although having referred to one or more specific embodiments describes the present invention, those skilled in the art will recognize that It, still can many modifications may be made to the present invention to without departing from the scope of the present invention.It will be understood, therefore, that appended In the range of claims and its equivalent, the present invention can be put into practice with mode otherwise than as specifically described herein.

Claims (15)

1. a kind of data of the data flow (301) of multiple data elements (301a, b) for including arranging in chronological order for processing Processing unit (300), which is characterized in that the data processing equipment (300) includes:
Processor (303) generates multiple data capsule structures (305a, b), wherein each for being based on the data flow (301) Data capsule structure (305a, b) including in the chronological data flow (301) the multiple data element (301a, B) subset,
Wherein described processor (303) be further used for each data capsule structure (305a, b) provide metadata (307a, B), wherein the metadata (307a, b) defines each data capsule structure (305a, b) relative to the multiple data capsule knot The time sequencing of other data capsule structures (305a, b) in structure (305a, b).
2. data processing equipment (300) according to claim 1, which is characterized in that each data capsule structure (305a, B) including at least one data window and/or the sub- container of at least one data, at least one data window includes described more At least one of the subset of a data element data element, at least one sub- container of data include at least two numbers According to child window, wherein each data sub-window includes at least one of the subset of the multiple data element data element Element.
3. data processing equipment (300) according to claim 1 or 2, which is characterized in that each data capsule structure The number in the subset of the metadata (307a, b) of (305a, b) including the multiple data element (301a, b) According to the pointer of element, the pointer is in the time sequencing of the subset of the multiple data element (301a, b) One data element.
4. data processing equipment (300) according to any one of the preceding claims, which is characterized in that the processor (303) it is further used for providing to each data capsule structure (305a, b) described with the data capsule structure (305a, b) The associated statistical data of the subset of multiple data elements (301a, b).
5. data processing equipment (300) according to claim 4, which is characterized in that each data capsule structure (305a, B) the metadata (307a, b) including with the data capsule structure (305a, b) the multiple data element (301a, B) pointer of the associated statistical data of the subset.
6. data processing equipment (300) according to any one of the preceding claims, which is characterized in that the processor (303) it is further used for by the way that the data element of the data flow (301) is added to the data capsule structure (305a, b) Data element the subset or removed by the subset of the data element from the data capsule structure (305a, b) Data element usually adjusts each data capsule structure (305a, b).
7. data processing equipment (300) according to any one of the preceding claims, which is characterized in that the data processing Device (300) further comprises the memory (309) for storing the multiple data capsule structure (305a, b).
8. one kind is arranged in the data appearance of multiple data elements (301a, b) in data flow (301) for storing in chronological order Device structure (305a, b), which is characterized in that the data capsule structure (305a, b) includes:
The subset of the multiple data element (301a, b) of the chronological data flow (301);And
Metadata (307a, b), define the data capsule structure (305a, b) in the data flow (301) it is described when Between sequence.
9. data capsule structure (305a, b) according to claim 8, which is characterized in that the data capsule structure (305a, b) includes at least one data window and/or the sub- container of at least one data, and at least one data window includes At least one of the subset of the multiple data element (301a, b) data element, at least one sub- container of data Including at least two data sub-windows, wherein each data sub-window includes the son of the multiple data element (301a, b) At least one of collection data element.
10. data capsule structure (305a, b) according to claim 8 or claim 9, which is characterized in that the data capsule structure The number in the subset of the metadata (307a, b) of (305a, b) including the multiple data element (301a, b) According to the pointer of element, the pointer is in the time sequencing of oneself of the multiple data element (301a, b) One data.
11. the data capsule structure (305a, b) according to any one of claim 8 to 10, which is characterized in that the number Further comprise the multiple data element with the data capsule structure (305a, b) according to structure of container (305a, b) The associated statistical data of the subset of (301a, b).
12. data capsule structure (305a, b) according to claim 11, which is characterized in that the data capsule structure The metadata (307a, b) of (305a, b) includes the multiple data element with the data capsule structure (305a, b) The pointer of the associated statistical data of the subset of (301a, b).
13. the data capsule structure (305a, b) according to any one of claim 8 to 12, which is characterized in that the number According to structure of container (305a, b) for by the way that the data element of the data flow (301) is added to the data capsule structure The subset of the data element of (305a, b) passes through the described of the data element from the data capsule structure (305a, b) Subset clears data element and can adjust.
14. a kind of number of the data flow (301) of multiple data elements (301a, b) for including arranging in chronological order for processing According to processing method (400), which is characterized in that the data processing method (400) includes:
(401) multiple data capsule structures (305a, b) are generated based on the data flow (301), wherein each data capsule structure The subset of the multiple data element of (305a, b) including the chronological data flow (301);And
(403) metadata (307a, b) is provided to each data capsule structure (305a, b), wherein the metadata (307a, b) Each data capsule structure (305a, b) is defined relative to other data in the multiple data capsule structure (305a, b) to hold The time sequencing of device structure (305a, b).
15. a kind of computer program, which is characterized in that performed during including being used to perform on computers according to claim 14 institute The program code of method (400) stated.
CN201680059351.7A 2016-01-05 2016-01-05 Data processing apparatus and method, and data container structure Active CN108140047B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/050053 WO2017118474A1 (en) 2016-01-05 2016-01-05 A data processing apparatus and method and a data container structure

Publications (2)

Publication Number Publication Date
CN108140047A true CN108140047A (en) 2018-06-08
CN108140047B CN108140047B (en) 2021-06-29

Family

ID=55080111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680059351.7A Active CN108140047B (en) 2016-01-05 2016-01-05 Data processing apparatus and method, and data container structure

Country Status (2)

Country Link
CN (1) CN108140047B (en)
WO (1) WO2017118474A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114424172A (en) * 2019-10-11 2022-04-29 国际商业机器公司 Virtual memory metadata management

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11599545B2 (en) * 2020-02-19 2023-03-07 EMC IP Holding Company LLC Stream retention in a data storage system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1575464A (en) * 1999-06-18 2005-02-02 奔流系统公司 Segmentation and processing of continuous data streams using transactional semantics
US20070078802A1 (en) * 2005-09-30 2007-04-05 International Business Machines Corporation Apparatus and method for real-time mining and reduction of streamed data
US7383253B1 (en) * 2004-12-17 2008-06-03 Coral 8, Inc. Publish and subscribe capable continuous query processor for real-time data streams
US20080256326A1 (en) * 2007-04-11 2008-10-16 Data Domain, Inc. Subsegmenting for efficient storage, resemblance determination, and transmission
US20110093491A1 (en) * 2009-10-21 2011-04-21 Microsoft Corporation Partitioned query execution in event processing systems
US7945540B2 (en) * 2007-05-04 2011-05-17 Oracle International Corporation Method to create a partition-by time/tuple-based window in an event processing service
CN102456065A (en) * 2011-07-01 2012-05-16 中国人民解放军国防科学技术大学 Methods for storing and querying offline historical statistical data of data stream
CN102456069A (en) * 2011-08-03 2012-05-16 中国人民解放军国防科学技术大学 Incremental aggregate counting and query methods and query system for data stream
CN103218423A (en) * 2013-04-02 2013-07-24 中国科学院信息工程研究所 Data inquiry method and device
CN103916478A (en) * 2014-04-11 2014-07-09 华为技术有限公司 Streaming data cube establishing method and device based on distributed system
US20140258290A1 (en) * 2013-03-07 2014-09-11 International Business Machines Corporation Processing control in a streaming application
CN104090952A (en) * 2014-07-02 2014-10-08 华中科技大学 Method and system for estimating average value of data flow under sliding window
CN105074698A (en) * 2013-02-19 2015-11-18 甲骨文国际公司 Executing continuous event processing (CEP) queries in parallel

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6792433B2 (en) * 2000-04-07 2004-09-14 Avid Technology, Inc. Indexing interleaved media data
JP4687253B2 (en) * 2005-06-03 2011-05-25 株式会社日立製作所 Query processing method for stream data processing system
RU2477883C2 (en) * 2007-08-20 2013-03-20 Нокиа Корпорейшн Segmented metadata and indices for streamed multimedia data
CA2783592A1 (en) * 2009-12-11 2011-06-16 Nokia Corporation Apparatus and methods for describing and timing representations in streaming media files
US8983952B1 (en) * 2010-07-29 2015-03-17 Symantec Corporation System and method for partitioning backup data streams in a deduplication based storage system
US8949240B2 (en) * 2012-07-03 2015-02-03 General Instrument Corporation System for correlating metadata
US9244978B2 (en) * 2014-06-11 2016-01-26 Oracle International Corporation Custom partitioning of a data stream

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1575464A (en) * 1999-06-18 2005-02-02 奔流系统公司 Segmentation and processing of continuous data streams using transactional semantics
US7383253B1 (en) * 2004-12-17 2008-06-03 Coral 8, Inc. Publish and subscribe capable continuous query processor for real-time data streams
US20070078802A1 (en) * 2005-09-30 2007-04-05 International Business Machines Corporation Apparatus and method for real-time mining and reduction of streamed data
US20080256326A1 (en) * 2007-04-11 2008-10-16 Data Domain, Inc. Subsegmenting for efficient storage, resemblance determination, and transmission
US7945540B2 (en) * 2007-05-04 2011-05-17 Oracle International Corporation Method to create a partition-by time/tuple-based window in an event processing service
US20110093491A1 (en) * 2009-10-21 2011-04-21 Microsoft Corporation Partitioned query execution in event processing systems
CN102456065A (en) * 2011-07-01 2012-05-16 中国人民解放军国防科学技术大学 Methods for storing and querying offline historical statistical data of data stream
CN102456069A (en) * 2011-08-03 2012-05-16 中国人民解放军国防科学技术大学 Incremental aggregate counting and query methods and query system for data stream
CN105074698A (en) * 2013-02-19 2015-11-18 甲骨文国际公司 Executing continuous event processing (CEP) queries in parallel
US20140258290A1 (en) * 2013-03-07 2014-09-11 International Business Machines Corporation Processing control in a streaming application
CN103218423A (en) * 2013-04-02 2013-07-24 中国科学院信息工程研究所 Data inquiry method and device
CN103916478A (en) * 2014-04-11 2014-07-09 华为技术有限公司 Streaming data cube establishing method and device based on distributed system
CN104090952A (en) * 2014-07-02 2014-10-08 华中科技大学 Method and system for estimating average value of data flow under sliding window

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114424172A (en) * 2019-10-11 2022-04-29 国际商业机器公司 Virtual memory metadata management
CN114424172B (en) * 2019-10-11 2023-03-21 国际商业机器公司 Virtual memory metadata management

Also Published As

Publication number Publication date
CN108140047B (en) 2021-06-29
WO2017118474A1 (en) 2017-07-13

Similar Documents

Publication Publication Date Title
JP6541868B2 (en) Condition-Satisfied Likelihood Prediction Using Recursive Neural Networks
US7895546B2 (en) Statistical design closure
US7945515B2 (en) Mass compromise/point of compromise analytic detection and compromised card portfolio management system
CN105074724B (en) Effective query processing is carried out using the histogram in columnar database
US8261189B2 (en) Database monitor replay
CN107066365A (en) The monitoring method and device of a kind of system exception
US20050055373A1 (en) Determining point-of-compromise
CN106952158A (en) Solve the problems, such as the bookkeeping methods and equipment of focus account
CN103582868B (en) Operator state checkpoints
US20100257092A1 (en) System and method for predicting a measure of anomalousness and similarity of records in relation to a set of reference records
CN105426410A (en) Data acquisition system and analytic method for same
CN109649916A (en) A kind of Intelligent cargo cabinet cargo recognition methods and device
CN107924357A (en) Job managing apparatus, job management method and job handling routine
CN110603558A (en) System and method for managing fraud detection in a financial transaction system
CN108140047A (en) Data processing equipment and method and data capsule structure
CN110377576A (en) Create method and apparatus, the log analysis method of log template
CN104915440A (en) Commodity de-duplication method and system
US9645224B2 (en) Processing of geo-spatial athletics sensor data
CN109542945A (en) Block chain data statistical analysis method, device and storage medium
JP2005004260A (en) Hospital management support system
CN108170837A (en) Method of Data Discretization, device, computer equipment and storage medium
CN110515946A (en) Data extraction method, device, equipment and computer readable storage medium
JP2011123644A (en) Data processing apparatus, data processing method and data processing program
Köneke A new data format for the commissioning phase of the ATLAS detector
CN107436868A (en) A kind of method for processing search results and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220210

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technologies Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.