Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, this method may comprise steps of the embodiment of the invention provides a kind of method for computing data:
Step 101: receiving data computation requests, wherein include: the mark of several target data views in data computation requests
Know.
In data computation requests further include: the first layer DS node of each target data view enters ginseng.One data meter
Calculating request can be for one or more Data View.
Step 102: being configured according to preset DAG corresponding with Data View, determine working as each target data view
Front layer DS node and current layer DS node enter ginseng.
DAG is configured to the form of expression of Data View, and DAG is layered structure, provides for each layer target DS node of determination
It is convenient.It may include multilayer in DAG configuration, the method that every layer of processing can be provided using step 102.
In embodiments of the present invention, for first layer, the ginseng that enters of DS node is the data for including in data computation requests
The first layer DS node of view enters ginseng, and for other layers other than first layer, the ginseng that enters of DS node is upper one layer
Data calculated result.
As shown in Fig. 2, being a kind of corresponding DAG configuration of Data View, the business purpose to be realized of the Data View is: root
According to input User ID, obtain the associated common IP of the User ID, and according to these IP lists acquisition had used these IP into
The account sum that row logs in.
DAG configuration includes two layers, and the corresponding execution task of first layer DS node is " taking family common IP list ", is entered
Ginseng is User ID;The corresponding execution task of second layer DS node is " the account number that the IP occurred ", and entering ginseng is first layer
Data calculated result, finally obtained data calculated result is " the account number that User ID common IP can be associated with out ".
It should be noted that first layer DS node and second layer DS node are located at DAG tree not in DAG configuration
Same layer, i.e., there are multiple levels to be calculated for each Data View itself.But for logic level, first layer DS node
Data active layer is belonged to second layer DS node.
Step 103: according to each current layer DS node and its entering ginseng, determine several current layer target DS node and its enter
Ginseng, wherein the first DS node and the second DS node, the first DS node and the 2nd DS are not present in several current layer target DS node
Node is identical and the first DS node enter to participate in the second DS node enter to join it is identical.
It should be noted that the first DS node and the second DS node are duplicate section in several current layer target DS node
Point, i.e. the corresponding execution task of two DS node is identical, and it is corresponding enter ginseng it is also identical.
When DAG configuration in include multilayer when, it is thus necessary to determine that each layer of target DS node and its enter ginseng.Now with Fig. 3-Fig. 5
Shown in three DAG configuration first layer for, step 103 is described in detail.
The first layer DS node of three DAG configuration is respectively as follows: DS4, DS1, DS1, it is corresponding enter ginseng be all User ID, due to
DS1 in Fig. 4 is identical as the DS1 in Fig. 5, and the ginseng that enters of two nodes is all User ID, then the DS1 in Fig. 4 can in Fig. 5
DS1 merge execute, i.e., first layer target DS node be DS4 and DS1, it is corresponding enter ginseng be all User ID.It is obtained after merging
Quantity of the quantity of current layer target DS node less than the current layer DS node before merging.
Step 104: ginseng being entered according to each current layer target DS node, executes each current layer target DS node.
Conventionally, as online operation system and offline business system cause to count there are huge environmental difference
Be defined separately according to calculating logic needs: i.e. to same data requirements, (data configuration comprising the complexity such as access, data mart modeling is patrolled
Volume), it needs to carry out stand-alone development twice according to environment difference.Exploitation cost in this way is high, and human cost is high, and is difficult to accomplish
Real mathematical logic equity.
In consideration of it, different according to the environment of application, this method is divided into following two situation:
Situation 1: local environment is in thread environment;
At this point, step 104 specifically includes:
A1: TR service interface is called.
A2: being supplied to TR service interface for the ginseng that enters of current layer target DS node so that TR service interface obtain with it is current
The data for entering ginseng and matching of layer target DS node.
Situation 2: local environment is offline environment;
At this point, step 104 specifically includes:
The data for entering ginseng and matching with current layer target DS node are filtered out from offline database.
By taking DAG shown in Fig. 2 configuration as an example, for first layer DS node, when local environment is the step in thread environment
104 can be by calling a TR service interface to realize: return IpService.queryIpList (userId).When locating
When environment is offline environment, step 104 can be realized by one section of SQL statement: select ip from table1 where
UserId=" userId ".
For second layer DS node, when local environment is in thread environment, step 104 can be by calling a TR service
Interface is realized: return IpService.queryUserIdCount (ipList).When local environment is offline environment, step
Rapid 104 can be realized by one section of SQL statement: select count (userId) from table1 where ip in
ipList。
In embodiments of the present invention, which supports the IO of configurationization to merge, and can to greatest extent be each
Operation system saves IO consumption.Also, the data engine supports primary configuration, can be suitable for online, offline environment simultaneously, can
Data mining cost is greatlyd save, and improves online, off-line data consistency.
It is respectively adapted to according to environment although this part of DS node needs are online and offline, due to following two, is made
It is simply controllable to obtain this process, not will increase exploitation complexity.
(1) DS node only includes most basic access logic, and complicated processing logic is not present, offline and online right very well
Together.
(2) it is calculated in scene in data, basic data logic is often the set of a very little.More data are to pass through place
It manages and processes and be derived.
It should be noted that at step 104, concurrently executing each current layer target to improve data computational efficiency
DS node.
Step 105: according to the implementing result of each current layer target DS node and the corresponding DAG of each target data view
Configuration, determines the data calculated result of the current layer of each target data view.
Step 105 specifically includes:
B1: configuring according to the implementing result of each current layer target DS node and the corresponding DAG of each target data view,
Determine the corresponding implementing result of each target data view.
Each layer DS node can be determined according to the DAG of target data view configuration, and number of targets can be determined by DS node
According to the corresponding target DS node of view, the implementing result of the target DS node is the corresponding implementing result of target data view.
The corresponding implementing result of target data view can be divided into two kinds: one is running succeeded, i.e. target data view
Corresponding current layer target DS node obtains entering the data that ginseng matches with it in preset execution time range;Another kind is
Execute failure, i.e., the corresponding current layer target DS node of target data view when being executed between do not obtain entering to join phase with it in range
Matched data.
B2: data calculating is carried out according to the corresponding implementing result of each target data view and DAG configuration, obtains each mesh
Mark the data calculated result of the current layer of Data View, wherein the corresponding data of different target Data View calculate serial execute.
In embodiments of the present invention, it the time that performance objective DS node is controlled by preset execution time range, improves
The efficiency that data calculate.The presence for executing time range can be avoided the data calculation process suspension of a target data view,
The progress of the data calculation process of other target data views is not influenced.If thering is some DS node not have in range between when being executed
It calculates, then the calculating process of this DS is put into serial computing in subsequent DS parameter preparation process.
For above two implementing result, carried out according to the corresponding implementing result of each target data view and DAG configuration
Data calculate, and are specifically divided into following two situation:
(1) when the corresponding current layer target DS node of target data view obtains in preset execution time range and it
When entering the data that ginseng matches, data calculating is carried out according to data and the corresponding DAG configuration of target data view.
(2) when the corresponding current layer target DS node of target data view when being executed between do not obtain in range entering ginseng with it
When the data to match, ginseng is entered according to the corresponding current layer target DS node of target data view, re-executes target data
The corresponding current layer target DS node of view, when the corresponding current layer target DS node of target data view when being executed between range
When inside obtaining entering the data that ginseng matches with it, data calculating is carried out according to the corresponding DAG configuration of target data view.
Certainly, in practical application scene, when the corresponding current layer target DS node of target data view when being executed between
When not obtaining entering the data that ginseng matches with it in range, the data calculation process of target data view can also be terminated.It needs
Illustrate, the corresponding data calculation process of a target data view terminates, and it is corresponding to have no effect on other target data views
Data calculation process.
Data calculating is abstracted as access logic and data processing logic by this method, wherein access logic passes through DS node
(data active layer) is realized, data mart modeling logic is realized by DAG configuration (Data View layer).When receiving data computation requests
When, this method will configure layering according to DAG and collect DS node (I/O node), and execute the DS node after duplicate removal, reduce to business
The access times of system reduce the IO consumption of operation system.
The embodiment of the present invention is by taking the corresponding DAG configuration of Fig. 3-three Data Views shown in fig. 5 as an example, to data calculating side
Method is described in detail, this method comprises:
S1: receive data computation requests, wherein include: in data computation requests several target data views mark and
The first layer DS node of each target data view enters ginseng.
Assuming that DAG shown in Fig. 3 configures corresponding data view 1, DAG shown in Fig. 4 configures corresponding data view 2, Fig. 5 institute
The DAG configuration corresponding data view 3 shown.
It include: the mark 1,2,3 of target data view in data computation requests, corresponding first layer DS node enters ginseng all
For User ID.
S2: it is configured according to preset DAG corresponding with Data View, determines the first layer DS of each target data view
Node and first layer DS node enter ginseng.
The first layer DS node of target data view 1 is DS4, corresponding to enter to join as User ID;The of target data view 2
One layer of DS node is DS1, corresponding to enter to join as User ID;The first layer DS node of target data view 3 be DS1, it is corresponding enter
Ginseng is User ID.
S3: according to each first layer DS node and its entering ginseng, determines several first layer target DS node and its enters ginseng,
In, the first DS node and the second DS node, the first DS node and the second DS node are not present in several first layer target DS node
It is identical and the first DS node enter participate in the second DS node enter to join it is identical.
First layer target DS node be DS1 and DS4, it is corresponding enter ginseng be all User ID.
S4: ginseng is entered according to each first layer target DS node, executes each first layer target DS node.
By taking target data view 1 as an example, when local environment is in thread environment, S4 is specifically included: calling TR service interface;
User ID is supplied to TR service interface, so that TR service interface obtains the data to match with User ID.
When local environment is offline environment, S4 is specifically included: being filtered out from offline database and is matched with User ID
Data.
S5: configuring according to the implementing result of each first layer target DS node and the corresponding DAG of each target data view,
Determine the corresponding implementing result of each target data view.
The corresponding implementing result of target data view 1 is the implementing result of DS4, target data view 2, target data view
3 corresponding implementing results are the implementing result of DS1.
S6: data calculating is carried out according to the corresponding implementing result of each target data view and DAG configuration, obtains each mesh
Mark the data calculated result of the first layer of Data View, wherein the corresponding data of different target Data View calculate serial execute.
Serial computing is carried out to above three target data view, but the specific computation sequence of target data view is not
It limits, for example, calculating separately the data of three target data view first layers according to the sequence of target data view 1,2,3
Calculated result.
By taking target data view 1 as an example, when DS4 obtains entering the number that ginseng matches with it in preset execution time range
According to when, according to data and the corresponding DAG of target data view 1 configuration carry out data calculating.Wherein, data calculating can be data
Filter (filter), data check etc..
When DS4 when being executed between do not obtain the data to match with User ID in range when, it is right according to target data view 1
The User ID answered re-executes the corresponding DS4 of target data view 1, when the corresponding DS4 of target data view 1 when being executed between
When obtaining the data to match with User ID in range, is configured according to the corresponding DAG of target data view 1 and carry out data calculating.
It is had been calculated into the first layer data of target data view 1 rear, successively carries out target data view 2 and target data
First layer data of view 3 calculates.
S7: it is configured according to preset DAG corresponding with Data View, determines the second layer DS of each target data view
Node and second layer DS node enter ginseng.
The second layer DS node of target data view 1 is DS2, corresponding to enter to join the data calculated result for its first layer;
The second layer DS node of target data view 2 is DS2, corresponding to enter to join the data calculated result for its first layer;Target data
The second layer DS node of view 3 is DS3, corresponding to enter to join the data calculated result for its first layer.
S8: according to each second layer DS node and its entering ginseng, determines several second layer target DS node and its enters ginseng,
In, the first DS node and the second DS node, the first DS node and the second DS node are not present in several second layer target DS node
It is identical and the first DS node enter participate in the second DS node enter to join it is identical.
Second layer target DS node is DS2 and DS3, it is corresponding enter ginseng be all upper one layer of data calculated result.
S9: ginseng is entered according to each second layer target DS node, executes each second layer target DS node.
By taking target data view 1 as an example, when local environment is in thread environment, S4 is specifically included: calling TR service interface;
The data calculated result of first layer is supplied to TR service interface, is tied so that TR service interface obtains to calculate with the data of first layer
The data that fruit matches.
When local environment is offline environment, S4 is specifically included: the data with first layer are filtered out from offline database
The data that calculated result matches.
S10: matched according to the implementing result of each second layer target DS node and the corresponding DAG of each target data view
It sets, determines the corresponding implementing result of each target data view.
The implementing result that target data view 1 and the corresponding implementing result of target data view 2 are DS2, target data view
The corresponding implementing result of Fig. 3 is the implementing result of DS3.
S6: data calculating is carried out according to the corresponding implementing result of each target data view and DAG configuration, obtains each mesh
Mark the data calculated result of the second layer of Data View, wherein the corresponding data of different target Data View calculate serial execute.
According to the sequence of target data view 1,2,3, the data for calculating separately three target data view second layers are calculated
As a result.
By taking target data view 1 as an example, when DS2 obtains calculating with the data of first layer in preset execution time range
When the data as a result to match, data calculating is carried out according to data and the corresponding DAG of target data view 1 configuration.Wherein, data
Calculate to be data deduplication, data check etc..
When DS2 when being executed between do not obtain the data to match with the data calculated result of first layer in range when, according to
The data calculated result of the corresponding first layer of target data view 1 re-executes the corresponding DS2 of target data view 1, works as target
The corresponding DS2 of Data View 1 when being executed between when obtaining the data to match with the data calculated result of first layer in range, root
It is configured according to the corresponding DAG of target data view 1 and carries out data calculating.
It is had been calculated into the second layer data of target data view 1 rear, successively carries out target data view 2 and target data
Second layer data of view 3 calculates.
As shown in fig. 6, a kind of data computing engines, comprising:
Receiving unit 601, for receiving data computation requests, wherein include: several target datas in data computation requests
The mark of view;
Determination unit 602 determines each for being configured according to preset directed acyclic graph DAG corresponding with Data View
The current layer DS node of a target data view and current layer DS node enter ginseng;
Combining unit 603, for according to each current layer DS node and its enter ginseng, determine several current layer target DS node
And its enter ginseng, wherein in several current layer target DS node be not present the first DS node and the second DS node, the first DS node with
Second DS node is identical and the first DS node enter to participate in the second DS node enter to join it is identical;
Execution unit 604 executes each current layer target DS section for entering ginseng according to each current layer target DS node
Point;
Computing unit 605, for according to each current layer target DS node implementing result and each target data view
Corresponding DAG configuration, determines the data calculated result of the current layer of each target data view.
In one embodiment of the invention, computing unit 605, for the execution according to each current layer target DS node
As a result the corresponding DAG configuration with each target data view, determines the corresponding implementing result of each target data view;According to each
The corresponding implementing result of a target data view and DAG configuration carry out data calculating, obtain the current of each target data view
The data calculated result of layer, wherein the corresponding data of different target Data View calculate serial execute.
In one embodiment of the invention, computing unit 605, for working as the corresponding current layer target of target data view
When DS node obtains entering the data that ginseng matches with it in preset execution time range, according to data and target data view
Corresponding DAG configuration carries out data calculating.
In one embodiment of the invention, computing unit 605 are further used for when target data view is corresponding current
Layer target DS node when being executed between when not obtaining entering the data that ginseng matches with it in range, it is corresponding according to target data view
Current layer target DS node enter ginseng, re-execute the corresponding current layer target DS node of target data view, work as number of targets
According to the corresponding current layer target DS node of view when being executed between obtain in range entering the data that ginseng matches with it when, according to mesh
It marks the corresponding DAG configuration of Data View and carries out data calculating.
In one embodiment of the invention, when local environment is the execution unit 604, for calling TR in thread environment
Service interface;The ginseng that enters of current layer target DS node is supplied to TR service interface, so that TR service interface obtains and current layer
The data for entering ginseng and matching of target DS node.
In one embodiment of the invention, when local environment is offline environment, execution unit 604 is used for from offline
The data for entering ginseng and matching with current layer target DS node are filtered out in database.
The embodiment of the invention provides a kind of data counting devices, comprising: processor and memory;
Memory for store execute instruction, processor be used for execute memory storage execute instruction to realize above-mentioned
The method of one embodiment.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example,
Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So
And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit.
Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause
This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device
(Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate
Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer
Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker
Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled
Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development,
And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language
(Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL
(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer
This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages,
The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can
Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller
Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited
Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to
Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic
Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc.
Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it
The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions
For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment
The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence " including one ... ", it is not excluded that including described
There is also other identical elements in the process, method of element, commodity or equipment.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group
Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art
For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal
Replacement, improvement etc., should be included within the scope of the claims of this application.