This application involves computer data processing technology field more particularly to a kind of data processing methods, device and equipment.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and
Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one
Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing
Every other embodiment obtained under the premise of creative work out, shall fall in the protection scope of this application.
Below in conjunction with attached drawing, the technical scheme provided by various embodiments of the present application will be described in detail.
Before describing the present invention, concept of the present invention is described briefly first.
Flow data (Data stream): being one group of sequence, a large amount of, data sequence that rapidly, continuously reaches.
Streaming computing (stream computing): referring to and analyzed in real time extensive flow data, can be from Pang
Effective knowledge and information are extracted in big and various continuous data stream, and extraction result is sent to next calculate node
Calculation.
Data warehouse (DataWarehouse): being subject-oriented, integrated, metastable reflecting history variation
Data acquisition system can be used for supporting administrative decision.Data warehouse will currently and history data store is in a place, to from one
The data of a or multiple data sources are integrated, and provide integrated data to provide decision data for the various decisions of entire enterprise
Or analysis report.
Reaction time: refer to behavior deadline and the behavior data in data warehouse using between the time
Delay.
Real-time data warehouse (Real-time Data Warehouse): referring under real-time condition, and the reaction time can be with
The data warehouse ignored.Usually in real-time data warehouse, data to generation is obtained from data source and summarizes the anti-of data
It should be controlled within ten minutes between seasonable.
Business Entity: in real-time data warehouse, refer to and carry out the data in enterprise information system on higher level
Comprehensive, classification and analysis and utilization a abstract concept, each Business Entity can correspond to a certain macroscopic analysis in enterprise
An analysis object involved in field.For example, Business Entity can include but is not limited to for e-commerce company
Order complains work order or cash refund slip etc..
Double-current join: referring to and merged by join operator to two flow datas, the side of the flow data after being merged
Method.Wherein, join operator from two data streams for extracting different associate fields, to obtain complete associate field.
Log type key value database: refer to the key value database that can produce database journal in database operational process.
Wherein, database journal refers to the text for the modification made for all affairs of database of record and each affairs to database
Part can inquire the data of write-in database according to database journal.Key value database (key-value store) refer to by
Data are stored as the database of key-value pair set, and wherein key is worth as unique identifier for the data to be stored.
Multi-dimensional data: the i.e. set of the data comprising multiple dimensions.Wherein, dimension is the generic for reflecting business
Property, the set of this generic attribute constitutes a dimension.
As described in the background section, when executing real-time streaming calculating task in real-time data warehouse at present, double fluid need to be passed through
The flow data that join technology provides multiple data sources merges.For example, when there are the first data source, the second data source and the
When three data sources such as three data sources, need to first using double fluid join technology to from the first data source flow data and from
The flow data of second data source merges, the flow data after obtaining the first merging, reuses double-current join technology and closes to first
It flow data after and and is merged from the flow data of third data source, the flow data after obtaining the second merging.Due to making
Efficiency is lower when merging flow data with double-current join technology, and when data source quantity is more, need to sequentially execute multiple double fluid
Join operation, this will aggravate data processing delay problem, to be unable to satisfy the operation demand of real-time data warehouse.
For this problem, inventors noted that data are stored in log type key assignments number by according to certain naming rule
Behind library, the data of deposit can have incidence relation in log type key value database, and can according to database journal information
Will have the data of incidence relation all to extract.This read-write operation based on to log type key value database and to phase
The method operational efficiency that pass data establish incidence relation is higher, mentions to provide data delay time lesser data processing method
Optimized integration is supplied.
In view of this, the present invention propose, by from each data source obtain the first Business Entity flow data store to
In log type key value database, and the first industry being written in the log type key value database is read according to database journal information
The multi-dimensional data of entity.By executing data write-in, log collection to log type key value database, being write according to counter look into of log
Enter the efficient operations such as data, the flow data of the first Business Entity from multiple data sources is merged, to reduce data
Processing delay time.
So far the basic principle of concept of the present invention and the embodiment of the present invention is briefly explained, below with reference to figure
1 to Fig. 4 is described in further details with regard to specific implementation process of the invention.
Fig. 1 is a kind of flow diagram for data processing method that this specification embodiment provides.For program angle,
The executing subject of process can be mounted in the data warehouse applications program on real-time data warehouse server.This method can answer
The scene more demanding for the flow data processing real-time to Business Entity, for example, this method can be applied to provide in real time
The application scenarios of decision data or the application scenarios that need to show System data.Wherein, each Business Entity can correspond to
A business diagnosis object involved in enterprise services different, Ge Geqi in practical applications as provided by each enterprise
There is also difference for the Business Entity of industry.For example, for providing the enterprise of e-tailing service, Business Entity may include but unlimited
In cash refund slip, goods orders, buyer's account, Merchant Account and a certain commodity etc..For providing the enterprise of stock exchange service, industry
Entity includes but is not limited to investor's account, stock delivery order and stock of certain joint-stock company distribution etc..
Hereafter it is applied in this way for the scene that need to be handled the flow data of cash refund slip this Business Entity, it is right
The specific implementation of this method is introduced.It is appreciated that using this method to each flow data involved in cash refund slip into
Row processing is a kind of illustrative explanation, is not construed as the restriction to this method.As shown in Figure 1, the process can wrap
It includes:
Step 101: the flow data of the first Business Entity is obtained from each data source.
In this specification embodiment, usual each Business Entity is directed to multiple data sources, what each data source provided
The data of the first Business Entity in flow data comprising at least one dimension.Specifically, it can be based on real-time streaming calculating task,
The flow data of the first Business Entity is obtained from each data source;Wherein, real-time streams calculating task can be based on Flink, Spark
The stream calculations engine implementation such as Steaming, Storm or Beam.For example, being directed to this Business Entity of cash refund slip, data source can be with
Including but not limited to instant communication information data source, message information data source, cash refund slip basic information data source and pictorial information
Data source.
Step 102: the flow data being stored to log type key value database, first in the log type key value database
There is incidence relation between the data of Business Entity.
, can be according to default naming rule in this specification embodiment, the flow data that will be obtained from each data source
It stores into log type key value database, has between the data of the first Business Entity to enable in the log type key value database
Incidence relation.Specifically, since the data in log type key value database are key-value pair format, it can be according to certain
Naming rule enables the key name of data in log type key value database corresponding with the affiliated Business Entity of the data and data source, from
And the incidence relation in log type key value database between the data of the first Business Entity is determined according to the key name of each data.
Step 103: according to default acquisition condition, obtaining the log information of the log type key value database.
In this specification embodiment, the log type key can periodically be obtained based on default acquisition condition automatically
The log information in Value Data library.
Step 104: according to the log information and the incidence relation, determining first in the log type key value database
The multi-dimensional data of Business Entity, the multi-dimensional data come from the data of each data source.
In this specification embodiment, the log information of acquisition can be enabled as trigger condition, when getting log information
When, in real-time streaming calculating task, according to the log information, determine that the first business is real in the log type key value database
The multi-dimensional data of body.Wherein, the multi-dimensional data of the first Business Entity includes each data from the first Business Entity
The data in source realize the merging of the flow data provided each data source.
In this specification embodiment, the flow data of the first Business Entity obtained from each data source is stored into the Summer Solstice or the Winter Solstice
In will type key value database, by executing data write-in, log collection to log type key value database, looking into write-in according to log is counter
The efficient operations such as data merge the flow data of the first Business Entity from multiple data sources, flow data can be improved
Combined efficiency reduces the data processing delay time.
Based on the method in Fig. 1, some specific embodiment parties that this specification embodiment additionally provides this method are existing, below into
Row explanation.
Due to need to usually be counted, be summarized to the flow data after merging according to summarization logic in real-time data warehouse,
Required summarize data to which the decisions at different levels for enterprise provide.Based on the method in Fig. 1, given in this specification embodiment
A kind of implementation that multi-dimensional data relevant to the first Business Entity is summarized.As shown in Fig. 2, the implementation
Process is as follows:
Step 201: the flow data of the first Business Entity is obtained from each data source.
Step 202: the flow data being stored to log type key value database, first in the log type key value database
There is incidence relation between the data of Business Entity.
Step 203: according to default acquisition condition, obtaining the log information of the log type key value database.
Step 204: according to the log information and the incidence relation, determining first in the log type key value database
The multi-dimensional data of Business Entity, the multi-dimensional data come from the data of each data source.
It, can be with the step 101-104 in Fig. 1 using identical for step 201-204 in this specification embodiment
Mode realizes that details are not described herein again.
Step 205: obtaining relevant dimension data.
In this specification embodiment, the relevant dimension data can be to have with first Business Entity and be associated with
The data of one or more Business Entities of system.In practical applications, it can preassign according to actual needs and the first business
Entity has the Business Entity of incidence relation.For example, when the first Business Entity is the commodity C that user A is bought on retail platform B
Cash refund slip when, having the Business Entity of incidence relation with the first Business Entity may include: commodity C, user A in retail platform
B using account and the businessman for selling goods C application account and wait.
Wherein, the relevant dimension data are either using method pair in step S201-S204 and the first Business Entity
The data obtained after the flow data processing of Business Entity with incidence relation, are also possible to preassigned static data.In
It, can be by the reality of the Business Entity with the first Business Entity with incidence relation inquired from data warehouse in practical application
Shi Weibiao, as relevant dimension data.For example, for user A retail platform B application account this with the first Business Entity
Business Entity with incidence relation, relevant dimension data both can for comprising user A using in account quantity on order,
The real-time dimension table of multiple dynamic datas such as cash refund slip quantity, or the association cell-phone number using account comprising user A, receipts
The real-time dimension table of multiple static datas such as goods address.
Step 206: the multi-dimensional data of the relevant dimension data and first Business Entity being divided, is obtained
Each business sub-field data set.
In this specification embodiment, each item number of each Business Entity can be preassigned according to enterprise diagnosis needs
According to affiliated business subdomain, to be drawn by multi-dimensional data of the business subdomain to relevant dimension data and the first Business Entity
Point, consequently facilitating subsequent summarize data.Wherein, the data in each business subdomain can be from different perspectives with level to industry
Business logic is described.For example, when the first Business Entity is cash refund slip, business corresponding to the data with the first Business Entity
Subdomain may include: businessman's subdomain, buyer's subdomain, commodity subdomain and reimbursement subdomain etc..
Step 207: the detail message of publication first Business Entity, the detail message include each business
Numeric field data set.
In this specification embodiment, message-oriented middleware (Active Messenger) can be used, publication described first
The detail message of Business Entity, using the real-time detail layer (data warehouse detail) as real-time data warehouse.Specifically
, message-oriented middleware can be realized using Notify or MetaQ etc..
Step 208: according to the detail message, the data at least one described business sub-field data set being converged
Always, it obtains summarizing data.
In this specification embodiment, since step 206 has obtained the data of each business subdomain, step 208
In no longer need to identify the affiliated business subdomain of each dimension data in detail message, divide.It in a step 208, can be with
According to preset summarization logic, the data of business subdomain needed for being extracted directly from detail message, and summarized, summarized
Data, what these data can be used as real-time data warehouse summarizes layer (data warehouse service), simplifies data
Aggregation step.For example, can summarize to the data of commodity subdomain, the reimbursement rate information of the commodity is obtained, in order to businessman
Whether continue to sell the commodity according to reimbursement rate information decision.Alternatively, summarizing to the data of buyer's subdomain, the buyer is obtained
Order reimbursement rate, in order to decision, whether the buyer is malicious user.Alternatively, to the data of commodity subdomain and reimbursement subdomain into
Row summarizes, to provide decision data for whether the cash refund slip meets reimbursement condition.
In this specification embodiment, by executing data write-in, log collection, according to day to log type key value database
Will is counter to look into the efficient operations such as write-in data, merges to the flow data of the first Business Entity from multiple data sources, can
Flow data combined efficiency is improved, the data processing delay time is reduced.According to business subdomain to relevant dimension data and described first
The multi-dimensional data of Business Entity is divided, and each business sub-field data set is obtained, then issues each business sub-field data
Gather the detail message as the first Business Entity, aggregation step when can simplify to data summarization, and then further promotes number
According to treatment effeciency, the data processing delay time is reduced.And in this implementation, if need to increase new because business needs
When data source, amendment step 201 and 202 is only needed, log type key assignments number is written into the flow data obtained from newly-increased data source
According to library, and adaptation step 208, newly-increased data are summarized by summarization logic;Increase in data source quantity
When more, data link length is constant in this implementation, and maintainable and stability is preferable.
In this specification embodiment, the log type key value database includes but is not limited to the data such as HBase or Redis
Library.It is provided in this specification when using HBase, the flow data is stored to the specific implementation of log type key value database
Mode.
In this implementation, the flow data can be stored to log type key value database, according to naming rule
The corresponding line unit name of data in the HBase from the first Business Entity of different data sources is identical, any one first
The data of the column cluster name of the data of Business Entity and any one first Business Entity from data source between have
Corresponding relationship, the data of the column name of the data of any one first Business Entity and any one first Business Entity
There is corresponding relationship between affiliated business subdomain.
In this specification embodiment, line unit (rowkey), column cluster (column family) and column can be passed through
(Column) title of these three dimensions quickly positions the data in HBase.Wherein, line unit is used to indicate that unique one
The major key of row record, every row include at least one column cluster, include an at least column data in each column cluster.Data in HBase have
The concept of version all saves a version information when generating or modify every time data, this edition data is exactly a timestamp.
In this specification embodiment, based on the first Business Entity be cash refund slip this application scenarios, to store to
The data of the first Business Entity in HBase are illustrated.Table 1 is the reimbursement that the reimbursement odd numbers stored in HBase is 123
Single multi-dimensional data.It is as shown in table 1:
In table 1, using the reimbursement odd numbers 123 of the cash refund slip as the line unit name in HBase.The cash refund slip has cash refund slip base
Two data sources such as plinth information data source and message information data source;Wherein, cash refund slip basic information data source provide quickly,
The data sequence of three dimensions such as the trade name of the reimbursement commodity continuously reached, the quantity of reimbursement commodity and buyer's account, message
Information data source provides the data sequence of two dimensions such as the buyer's message rapidly, continuously reached and businessman's message.Reimbursement quotient
The trade name of product belongs to the data of commodity subdomain, and the quantity of reimbursement commodity belongs to the data of reimbursement subdomain, buyer's account and buyer
Message belongs to the data of buyer's subdomain, and businessman, which leaves a message, belongs to the data of businessman's subdomain.
In this specification embodiment, by storing the flow data of the first Business Entity to log type key value database,
The incidence relation between multiple flow datas of the first Business Entity is established, the merging to multiple flow datas is realized.Due to logical
10 milliseconds or so are only often needed to the read-write of data in HBase, therefore, multiple flow datas are merged in this specification embodiment
When, can will control within several ms delay time, and using double-current join method to multiple flow datas merge up to
Several hundred milliseconds are needed less, and the merging method of stream data is more efficient in this specification embodiment, and the data processing delay time is more
It is few.And when using double fluid join method, one data source of every increase needs the primary double fluid join operation of more execution, data link
Increase, not only influences data-handling efficiency, it is also larger to the change of data processor, influence the stable of processing routine
Property;And the implementation provided in the present embodiment, when data source number is changed, data link length is constant, can tie up
Shield property and stability are more preferable.
In this specification embodiment, when using HBase as log type key value database, the default acquisition condition
It may include reaching default acquisition moment or newly-increased log information data volume greater than at least one of preset threshold, it is described newly-increased
Log information is the log information generated at the last log collection moment to HBase described between current time.
Wherein, the log information of the HBase is HLog file, and the HLog file is that HBase realizes WAL (i.e.
Write ahread log) log information that generates of mode.Specifically, can all be write to the modification of the data in Hbase each time
Enter into memorystore, after data are written successfully, this record will be written in HLog by Hbase, generate HLog text
Part.
The basis presets acquisition condition, obtains the log information of the log type key value database, can specifically include:
When reaching the default acquisition moment, the log information of the HBase is acquired;Alternatively, working as the newly-increased log
When information data amount is greater than preset threshold, the log information of the HBase is acquired.
In this specification embodiment, when using HBase as log type key value database, the determination log
The multi-dimensional data of first Business Entity in type key value database, can specifically include:
According to the log information, column cluster belonging to the data of the first Business Entity increased newly in the HBase is determined;Really
Fixed line unit corresponding with column cluster belonging to the data of the first newly-increased Business Entity;By in the HBase with the line unit
Corresponding data are determined as the multi-dimensional data of first Business Entity.
In this specification embodiment, due to the line unit famous prime minister for the flow data that each data source of the first Business Entity provides
Together, therefore, total data corresponding to the line unit of the data of any one the first Business Entity can be extracted, as the first business
The multi-dimensional data of entity, to obtain the multi-dimensional data after merging to the flow data of multiple first Business Entities.
In this specification embodiment, for step 206: described real to the relevant dimension data and first business
The multi-dimensional data of body is divided, and can specifically include:
According to the column name of column belonging to the data of any one first Business Entity and any one described first business
Corresponding relationship between the affiliated business subdomain of the data of entity, divides the multi-dimensional data of first Business Entity.
According to the corresponding relationship between the relevant dimension data and the business subdomain of first Business Entity, to the relevant dimension
Data are divided.
In this specification embodiment, for any one business subdomain number in each business sub-field data set
According to set, the formats of the data in any one described business sub-field data set can be for key-value pair format or JSON (i.e.
JavaScript Object Notation) format, each data in any one described business sub-field data set are corresponding
One field may include multiple fields in any one described business sub-field data set.
It, can be according to the column name and the affiliated business subdomain of the data of column data each in HBase in this specification embodiment
Between corresponding relationship, the multi-dimensional data of the first Business Entity is divided.For example, as shown in Table 1, according to buyer's account
The column name " buyer's subdomain-buyer's account " of this dimension data of family, it is known that the data of this dimension of buyer's account belong to buyer's
Therefore this column data of buyer's account can be divided to buyer's sub-field data set by the data in domain.According to same principle to table
Other data are divided in one, it is known that, comprising " timestamp is buyer's account-of t1 in buyer's sub-field data set
Xiaohong ", " buyer's account-xiaohong that timestamp is t2 " and " buyer's message-commodity breakage that timestamp is t1 please be true
Recognize " etc. three fields.
In this specification embodiment, due to may include multiple fields in each sub-field data set, when the first business
When the data source quantity of entity increases, column name after HBase is stored in by the flow data for enabling newly-increased data source provide and its belonging to
There is corresponding relationship between business subdomain, the flow data that newly-increased data source provides can be carried out automatic according to the corresponding relationship
It divides, obtains updated each business sub-field data set.In the present embodiment, it when data source quantity increases, need to only press
HBase is written according to the flow data that default naming rule provides newly-increased data source, the field that can provide newly-increased data source is certainly
Dynamic to be divided in corresponding business sub-field data set, data link length is fixed, and the maintainability of scheme not only can be improved
And stability, it can also reduce the consumed resource of the program.
In this specification embodiment, by executing data write-in, log collection to HBase, looking into write-in according to log is counter
The efficient operations such as data merge the flow data of the first Business Entity from multiple data sources;Due to usually right
The delay time of the read-write of data is at ten milliseconds or so in HBase, and the delay time of real-time streaming calculating task is in tens millis
Second or so, so the scheme that this specification embodiment provides, converge from flow data that each data source provides is obtained to finally obtaining
The data delay time of total data can control at submicrosecond grade (i.e. several hundred milliseconds), and the data summarization based on double-current join technology
The data processing delay time of scheme is usually several seconds to tens seconds;The scheme that this specification embodiment provides reduces at data
Delay time is managed, and its consumed resource is few, data link length is fixed, maintainable and high stability, and then to build
Submicrosecond grade, low cost, low consumption of resources and high maintainable real-time data warehouse provide implementation.
In this specification embodiment, before the flow data that the first Business Entity is obtained from each data source, also
May include:
Streaming computing task is obtained, the streaming computing task is used to obtain and handle the flow data of the first Business Entity;
Determine each data source of first Business Entity;The data source is for providing the flow data of the first Business Entity;For institute
It states streaming computing task and subscribes to the data source, obtained from the data source of subscription in the streaming computing task execution
The flow data of first Business Entity, and the flow data is handled in real time.
Based on same thinking, this specification embodiment additionally provides the corresponding device of the above method.Fig. 3 is this specification
The structural schematic diagram for corresponding to a kind of data processing equipment of method in Fig. 1 that embodiment provides.As shown in figure 3, the device can
To include:
First obtains module 301, for obtaining the flow data of the first Business Entity from each data source.
Memory module 302, for storing the flow data to log type key value database, the log type key assignments data
There is between the data of the first Business Entity incidence relation in library.
Second obtains module 303, for according to acquisition condition is preset, obtaining the log letter of the log type key value database
Breath.
First determining module 304, for determining the log type key assignments according to the log information and the incidence relation
The multi-dimensional data of first Business Entity in database, the multi-dimensional data come from the data of each data source.
In this specification embodiment, by memory module 302 will from each data source obtain the first Business Entity
Flow data store into log type key value database, and by the first determining module 304 according to log type key value database
Log information is counter to look into the data that the first Business Entity of the database is written, thus obtain obtaining after merging multiple flow datas
Flow data combined efficiency can be improved in the multi-dimensional data of first Business Entity, reduces the data processing delay time.
In this specification embodiment, the data processing equipment can also include:
Third obtains module, for obtaining relevant dimension data.
Division module is drawn for the multi-dimensional data to the relevant dimension data and first Business Entity
Point, obtain each business sub-field data set.
Release module, for issuing the detail message of first Business Entity, the detail message includes described each
Business sub-field data set.
Summarizing module is used for according to the detail message, the data at least one described business sub-field data set
Summarized, obtains summarizing data.
In this specification embodiment, the division module in data processing equipment can be according to business subdomain to relevant dimension
Data and the multi-dimensional data of first Business Entity are divided, and each business sub-field data set is obtained;Pass through publication
Module issues detail message of each business sub-field data set as the first Business Entity, remittance when can simplify to data summarization
Total step, and then data-handling efficiency is further promoted, reduce the data processing delay time.And when data source data increases,
Only need to will from newly-increased data source obtain flow data be written log type key value database, and adaptability press summarization logic pair
Newly-increased data are summarized;I.e. when data source quantity increases, the data link length in notebook data processing unit is not
Become, maintainable and stability is preferable.
In this specification embodiment, the log type key value database can be HBase;The memory module, specifically
It can be used for:
The flow data is stored to HBase, from the first Business Entity of different data sources in the HBase
The corresponding line unit name of data is identical, the column cluster name of the data of any one the first Business Entity and any one described first business
The data of entity from data source between have corresponding relationship, the column name of the data of any one first Business Entity
There is corresponding relationship between the affiliated business subdomain of data of any one first Business Entity.
In this specification embodiment, the division module specifically can be used for:
According to the column name of column belonging to the data of any one first Business Entity and any one described first business
Corresponding relationship between the affiliated business subdomain of the data of entity, divides the multi-dimensional data of first Business Entity.
And according to the corresponding relationship between the relevant dimension data and the business subdomain of first Business Entity, to the related dimension
Degree is according to being divided.
In this specification embodiment, for any in each business sub-field data set of division module generation
One business sub-field data set, the format of the data in any one described business sub-field data set are key-value pair format,
The corresponding field of each data in any one described business sub-field data set.
In this specification embodiment, the default acquisition condition may include reaching default acquisition moment or newly-increased log
Information data amount is greater than at least one of preset threshold, the newly-increased log information be the last log collection moment extremely
The log information that the HBase is generated between current time.
Described second obtains module 303, specifically can be used for: when reaching the default acquisition moment, described in acquisition
The log information of HBase;Alternatively, acquiring the day of the HBase when the newly-increased log information data volume is greater than preset threshold
Will information.
In this specification embodiment, first determining module 304 specifically can be used for:
According to the log information, column cluster belonging to the data of the first Business Entity increased newly in the HBase is determined.
Determine line unit corresponding with column cluster belonging to the data of the first newly-increased Business Entity.
Data corresponding with the line unit in the HBase are determined as to the multi-dimensional data of first Business Entity.
In this specification embodiment, the data processing equipment can also include:
Third obtains module, and for obtaining streaming computing task, the streaming computing task is for obtaining and handling first
The flow data of Business Entity.
Second determining module, for determining each data source of first Business Entity;The data source is for providing
The flow data of first Business Entity.
Subscribing module, for subscribing to the data source for the streaming computing task.
Based on same thinking, this specification embodiment additionally provides the corresponding equipment of the above method.Fig. 4 is this specification
A kind of structural schematic diagram for data processing equipment that embodiment provides.As shown in figure 4, the equipment 400 may include:
At least one processor 410;And
The memory 430 being connect at least one described processor communication;Wherein,
The memory is stored with the instruction 420 that can be executed by least one described processor 410, and described instruction is described
At least one processor 410 execute so that at least one described processor 410 can:
The flow data of the first Business Entity is obtained from each data source.
The flow data is stored to log type key value database, the first Business Entity in the log type key value database
Data between have incidence relation.
According to default acquisition condition, the log information of the log type key value database is obtained.
According to the log information and the incidence relation, the first Business Entity in the log type key value database is determined
Multi-dimensional data, the multi-dimensional data comes from the data of each data source.
In this specification embodiment, data processing equipment by from each data source obtain the first Business Entity stream
Data are stored into log type key value database, by executing data write-in, log collection, basis to log type key value database
Log is counter to look into the efficient operations such as write-in data, merges to the flow data of the first Business Entity from multiple data sources,
Flow data combined efficiency can be improved, reduce the data processing delay time.
Processor 410 in the data processing equipment, additionally it is possible to:
Obtain relevant dimension data.
The multi-dimensional data of the relevant dimension data and first Business Entity is divided, each business is obtained
Sub-field data set.
The detail message of first Business Entity is issued, the detail message includes each business sub-field data collection
It closes.
According to the detail message, the data at least one described business sub-field data set are summarized, are obtained
Summarize data.
In this specification embodiment, data processing equipment can also be according to business subdomain to relevant dimension data and described
The multi-dimensional data of first Business Entity is divided, and each business sub-field data set is obtained, then issues each business subdomain
Detail message of the data acquisition system as the first Business Entity, aggregation step when can simplify to data summarization, and then further mention
Data-handling efficiency is risen, the data processing delay time is reduced.It, only need to will be from newly-increased data source and when data source data increases
Log type key value database is written in the flow data that place obtains, and adaptability by summarization logic summarizes i.e. newly-increased data
It can;I.e. when data source quantity increases, data link length is constant in notebook data processing equipment, maintainable and stability compared with
It is good.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example,
Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So
And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit.
Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause
This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device
(Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate
Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer
Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker
Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled
Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development,
And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language
(Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL
(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer
This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages,
The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can
Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller
Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited
Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to
Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic
Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc.
Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it
The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions
For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment
The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, disk storage or other magnetic storage devices or any
Other non-transmission mediums, can be used for storage can be accessed by a computing device information.As defined in this article, computer-readable
Medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method of element, commodity or equipment.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group
Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art
For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal
Replacement, improvement etc., should be included within the scope of the claims of this application.