CN116755804B - Assembled integrated big data processing method and system - Google Patents
Assembled integrated big data processing method and system Download PDFInfo
- Publication number
- CN116755804B CN116755804B CN202310803713.0A CN202310803713A CN116755804B CN 116755804 B CN116755804 B CN 116755804B CN 202310803713 A CN202310803713 A CN 202310803713A CN 116755804 B CN116755804 B CN 116755804B
- Authority
- CN
- China
- Prior art keywords
- data processing
- data
- processing unit
- dependency
- execution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 692
- 238000005516 engineering process Methods 0.000 claims abstract description 125
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000013500 data storage Methods 0.000 claims description 51
- 238000004220 aggregation Methods 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 10
- 238000004140 cleaning Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000007405 data analysis Methods 0.000 claims description 7
- 238000004806 packaging method and process Methods 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 238000007418 data mining Methods 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 abstract description 7
- 230000004927 fusion Effects 0.000 abstract description 4
- 230000033772 system development Effects 0.000 abstract description 4
- 238000012098 association analyses Methods 0.000 description 3
- 238000005538 encapsulation Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/448—Execution paradigms, e.g. implementations of programming paradigms
- G06F9/4488—Object-oriented
- G06F9/449—Object-oriented method invocation or resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44521—Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
- G06F9/44526—Plug-ins; Add-ons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
- G06F9/548—Object oriented; Remote method invocation [RMI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/549—Remote execution
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides an assembled integrated big data processing method and system, wherein the system comprises the following steps: a first defining module; a second definition module for predefining the data object; dividing the module; a data processing unit generating module; the data processing unit dependency relation table building module and the data processing unit executing module. The method and the system for processing the assembled integrated big data have the following advantages: the invention provides a big data processing system and method which are flexibly assembled and constructed through scenes, technologies and algorithms, realizes the integrated construction of fusion application and complex application scenes of different technologies in big data processing, and effectively improves the system development efficiency.
Description
Technical Field
The invention belongs to the technical field of big data processing, and particularly relates to an assembled integrated big data processing method and system.
Background
In recent years, big data technology is rapidly developed, and informatization also enters a 'big integration, high sharing and deep application' stage. With the wide application of the internet of things and intelligent equipment, various structured, semi-structured and unstructured data also show explosive growth. The method has the advantages that the large data processing system with stronger processing capacity, easier expansion and higher performance is built, the requirements of high calculation, high storage and high load can be well met, mass data can be analyzed and mined, and the value of the data is improved to the maximum extent. However, big data processing techniques are various, and different techniques are suitable for different types of data and different application scenarios. In addition, the big data processing involves a plurality of links such as data acquisition, aggregation, cleaning, aggregation, analysis and the like, and the technology and business knowledge involved in each link are different, so that a set of big data processing system is designed and developed with great technical difficulty. At present, enterprises generally adopt methods for constructing multiple systems, including but not limited to: the data acquisition system, the data management system, the data analysis system, the data index system and the like are implemented and completed step by different teams, so that the system has longer construction period and higher construction cost, and the rapid change of the data processing business requirements and the iterative update of the technology are difficult to respond rapidly.
Therefore, in the field of big data processing, the following technical problems are urgently needed to be solved: aiming at the large data technology with various kinds and the large data processing requirement with various kinds, a large data processing system construction method which can be used for integrating various large data technologies, meeting various application scenes and rapidly responding to business and technology changes is provided, and an integrated data processing mechanism covering all links of data processing is realized.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an assembled integrated big data processing method and system, which can effectively solve the problems.
The technical scheme adopted by the invention is as follows:
the invention provides an assembled integrated big data processing method, which comprises the following steps:
Step 1, predefining a data processing scene, a data processing service, a data processing technical component and a data processing algorithm plug-in; the data processing service is provided with a service calling interface for calling the data processing service; the data processing technology component is provided with a component calling interface used for calling the data processing technology component; the data processing algorithm plug-in is provided with a plug-in calling interface and is used for calling the data processing algorithm plug-in;
step 2, predefining a data object; specifically, uniformly registering a data source object and a data storage target object as data objects, wherein each data object has a unique data object identifier;
Step 3, loading a data processing flow model which needs to be subjected to data processing; dividing the data processing flow model into n data processing scenes;
The data processing units corresponding to each data processing scene are generated, and the specific generation method is as follows:
1) Determining a data object serving as a data source of the data processing unit from the data objects in the step 2, wherein the data object is called a data source object, and the data object serving as a storage target of a result after the data processing unit executes is called a data storage target object;
2) Screening out required data processing services, data processing technical components and data processing algorithm plug-ins according to the data processing scene requirements, assembling and packaging according to the calling relation among the data processing services, the data processing technical components and the data processing algorithm plug-ins, and associating data source objects and data storage target objects to obtain the configured data processing unit;
Step 4, establishing a data processing unit dependency relationship table according to the association between the data source objects and the data storage target objects among the data processing units; the data processing unit dependency relationship table is used for storing the dependency degree of each data processing unit and the dependency relationship between the data processing unit and other data processing units, and marking the execution state of each data processing unit, wherein the execution state comprises the following steps: an unexecuted state and an execution completed state;
Specifically, for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, the calculation method of the dependency and the dependency relationship is as follows:
For the jth data processing unit in other n-1 data processing units, j is not equal to i, j is not equal to 1,2,3, …, n, if a certain data object serving as a data storage target object exists in the jth data processing unit and is the same as a certain data object serving as a data source object of the ith data processing unit, the jth data processing unit is depended on, and the dependence of the ith data processing unit is increased by 1; sequentially comparing and identifying other n-1 data processing units, so as to obtain the dependence degree and the dependence relation of the ith data processing unit;
Step 5, reading the dependency relationship table of the data processing units, extracting each data processing unit marked as an unexecuted state and having a dependency degree of 0, and forming a data processing unit set;
Step 6, traversing the data processing unit set, asynchronously executing each data processing unit in the data processing unit set, after the execution of the data processing unit is completed, storing an execution result to a corresponding data storage target object by the data processing unit, and updating an execution state mark of a data processing unit dependency relationship table to mark the data processing unit as an execution completion state;
In this step, the execution method of each data processing unit is as follows:
1) Reading a data source according to the configured data source object;
2) Calling data processing service through a service gateway route; the data processing service loads and calls the corresponding data processing technology component through a reflection technology; the data processing technology component loads and calls the corresponding data processing algorithm plug-in through a reflection technology;
The called data processing algorithm plug-in performs data processing on the data source, and returns a data processing intermediate result to the called data processing technology component; the called data processing technology component further performs data processing on the data processing intermediate result to obtain a data processing result, and returns the data processing result to the called data processing service; therefore, the called data processing service obtains a data processing result, namely the execution result;
step 7, reading the current data processing unit dependency relationship table, judging whether a data processing unit marked as an unexecuted state exists or not, and ending the flow if the data processing unit marked as the unexecuted state does not exist; if so, executing step 8;
step 8, updating the dependency degree and the dependency relation of the data processing unit dependency relation table to obtain an updated data processing unit dependency relation table, wherein the updating method comprises the following steps:
for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, searching for an unexecuted state from the data processing unit dependency table and depending on each data processing unit of the ith data processing unit after execution thereof is completed, and subtracting 1 from the searched dependency of the data processing unit;
and 9, returning to the step 5, and circularly executing.
Preferably, each class of data processing services configures multiple versions of the data processing services; each type of data processing technology component configures multiple versions of the data processing technology component; each class of data processing algorithm plugins configures multiple versions of the data processing algorithm plugins.
Preferably, the data processing scene comprises a data acquisition scene, a data synchronization scene, a data convergence scene, a data cleaning scene and a data analysis scene.
Preferably, the data acquisition scene is: realizing the data acquisition and storage of the data source to a designated data storage target object;
The data synchronization scene is: the data source is synchronized among different data storage target objects;
the data aggregation scene: constructing dimensions for the basic data, and constructing a business width table through dimension association;
The data cleaning scene is as follows: realizing the cleaning treatment and standardization treatment of dirty data;
The data analysis scene is: and realizing data mining analysis.
Preferably, the assembly and encapsulation method comprises the following steps:
constructing multiple types of data processing services under each data processing scene, and constructing multiple data processing service versions by each type of data processing service so as to adapt to the changes of different running environments and business scenes;
Constructing a plurality of types of data processing technology components under each type of data processing service, wherein each type of data processing technology components is loaded and called by the data processing service through a reflection technology according to a unified interface standard; each type of data processing technology component constructs a plurality of data processing technology component versions to adapt to the change of the data processing service;
Constructing a multi-class data processing algorithm plug-in under each class of data processing technology assembly, loading and calling each class of data processing algorithm plug-in by the data processing technology assembly according to a unified interface standard through a reflection technology, and according to different calling modes, the data processing technology assembly comprises two types: the first is internal loading calling, which requires the realization technology of the data processing algorithm plug-in to be consistent with the data processing technology component, and the data processing technology component calls the data processing algorithm plug-in through an internal loading calling method; the second is http call, which is used for the data processing technical component to call the data processing algorithm plug-in through the http interface; each class of data processing algorithm plugins builds multiple versions of the data processing algorithm plugins to satisfy the extended variations of the data processing logic.
The invention also provides a system of the assembled integrated big data processing method, which comprises the following steps:
The first defining module is used for predefining a data processing scene, a data processing service, a data processing technical component and a data processing algorithm plug-in; the data processing service is provided with a service calling interface for calling the data processing service; the data processing technology component is provided with a component calling interface used for calling the data processing technology component; the data processing algorithm plug-in is provided with a plug-in calling interface and is used for calling the data processing algorithm plug-in;
A second definition module for predefining the data object; specifically, uniformly registering a data source object and a data storage target object as data objects, wherein each data object has a unique data object identifier;
The dividing module is used for loading a data processing flow model which needs to be subjected to data processing; dividing the data processing flow model into n data processing scenes;
The data processing unit generating module is used for generating data processing units corresponding to each data processing scene, and the specific generating method comprises the following steps:
1) Determining a data object which is a data source of the data processing unit from defined data objects, namely a data source object, and a data object which is a storage target as a result of execution of the data processing unit, namely a data storage target object;
2) Screening out required data processing services, data processing technical components and data processing algorithm plug-ins according to the data processing scene requirements, assembling and packaging according to the calling relation among the data processing services, the data processing technical components and the data processing algorithm plug-ins, and associating data source objects and data storage target objects to obtain the configured data processing unit;
The data processing unit dependency relationship table establishing module is used for establishing a data processing unit dependency relationship table according to the association between the data source objects and the data storage target objects among the data processing units; the data processing unit dependency relationship table is used for storing the dependency degree of each data processing unit and the dependency relationship between the data processing unit and other data processing units, and marking the execution state of each data processing unit, wherein the execution state comprises the following steps: an unexecuted state and an execution completed state;
Specifically, for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, the calculation method of the dependency and the dependency relationship is as follows:
For the jth data processing unit in other n-1 data processing units, j is not equal to i, j is not equal to 1,2,3, …, n, if a certain data object serving as a data storage target object exists in the jth data processing unit and is the same as a certain data object serving as a data source object of the ith data processing unit, the jth data processing unit is depended on, and the dependence of the ith data processing unit is increased by 1; sequentially comparing and identifying other n-1 data processing units, so as to obtain the dependence degree and the dependence relation of the ith data processing unit;
A data processing unit execution module comprising:
Step 1, reading a data processing unit dependency relationship table, extracting each data processing unit marked as an unexecuted state and having a dependency degree of 0, and forming a data processing unit set;
Step 2, traversing the data processing unit set, asynchronously executing each data processing unit in the data processing unit set, after the execution of the data processing unit is completed, storing an execution result to a corresponding data storage target object by the data processing unit, and updating an execution state mark of a data processing unit dependency relationship table to mark the data processing unit as an execution completion state;
In this step, the execution method of each data processing unit is as follows:
1) Reading a data source according to the configured data source object;
2) Calling data processing service through a service gateway route; the data processing service loads and calls the corresponding data processing technology component through a reflection technology; the data processing technology component loads and calls the corresponding data processing algorithm plug-in through a reflection technology;
The called data processing algorithm plug-in performs data processing on the data source, and returns a data processing intermediate result to the called data processing technology component; the called data processing technology component further performs data processing on the data processing intermediate result to obtain a data processing result, and returns the data processing result to the called data processing service; therefore, the called data processing service obtains a data processing result, namely the execution result;
step 3, reading the current data processing unit dependency relationship table, judging whether a data processing unit marked as an unexecuted state exists or not, and ending the flow if the data processing unit marked as the unexecuted state does not exist; if so, executing the step 4;
And 4, updating the dependency degree and the dependency relation of the data processing unit dependency relation table to obtain an updated data processing unit dependency relation table, wherein the updating method comprises the following steps of:
for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, searching for an unexecuted state from the data processing unit dependency table and depending on each data processing unit of the ith data processing unit after execution thereof is completed, and subtracting 1 from the searched dependency of the data processing unit;
And 5, returning to the step 1, and circularly executing.
The method and the system for processing the assembled integrated big data have the following advantages:
the invention provides a big data processing system and method which are flexibly assembled and constructed through scenes, technologies and algorithms, realizes the integrated construction of fusion application and complex application scenes of different technologies in big data processing, and effectively improves the system development efficiency.
Drawings
FIG. 1 is a schematic flow chart of an assembled integrated big data processing method provided by the invention;
FIG. 2 is a schematic diagram of the logical structure of a data processing unit according to the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects solved by the invention more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems that the big data processing system has various application scenes, complex business logic and difficult consistency processing, the big data technology system is huge, the iteration updating is fast and difficult to be compatible, and the like, the big data processing system and the method are flexibly assembled and constructed through scenes, technologies and algorithms, the integrated construction of the fusion application of different technologies and the complex application scenes in the big data processing is realized, and the system development efficiency is effectively improved.
The invention provides an assembled integrated big data processing method, referring to FIG. 1, comprising the following steps:
Step 1, predefining a data processing scene, a data processing service, a data processing technical component and a data processing algorithm plug-in; the data processing service is provided with a service calling interface for calling the data processing service; the data processing technology component is provided with a component calling interface used for calling the data processing technology component; the data processing algorithm plug-in is provided with a plug-in calling interface and is used for calling the data processing algorithm plug-in;
in a specific implementation, each type of data processing service configures multiple versions of data processing services; each type of data processing technology component configures multiple versions of the data processing technology component; each class of data processing algorithm plugins configures multiple versions of the data processing algorithm plugins.
Step 2, predefining a data object; specifically, uniformly registering a data source object and a data storage target object as data objects, wherein each data object has a unique data object identifier;
Step 3, loading a data processing flow model which needs to be subjected to data processing; dividing the data processing flow model into n data processing scenes;
in particular, the data processing scenarios include, but are not limited to, a data acquisition scenario, a data synchronization scenario, a data aggregation scenario, a data cleansing scenario, and a data analysis scenario.
The data acquisition scene is as follows: realizing the data acquisition and storage of the data source to a designated data storage target object;
The data synchronization scene is: the data source is synchronized among different data storage target objects;
the data aggregation scene: constructing dimensions for the basic data, and constructing a business width table through dimension association;
The data cleaning scene is as follows: realizing the cleaning treatment and standardization treatment of dirty data;
The data analysis scene is: and realizing data mining analysis. Including but not limited to: data association analysis scenario: and realizing association analysis among a plurality of data tables. Index multidimensional aggregation analysis scenario: and aggregation analysis of the index items according to different dimensions is realized. Data mining analysis scenes; and the deep mining analysis of the data through a machine learning algorithm is realized.
The data processing units corresponding to each data processing scene are generated, and the specific generation method is as follows:
1) Determining a data object serving as a data source of the data processing unit from the data objects in the step 2, wherein the data object is called a data source object, and the data object serving as a storage target of a result after the data processing unit executes is called a data storage target object;
2) Screening out required data processing services, data processing technical components and data processing algorithm plug-ins according to the data processing scene requirements, assembling and packaging according to the calling relation among the data processing services, the data processing technical components and the data processing algorithm plug-ins, and associating data source objects and data storage target objects to obtain the configured data processing unit;
the assembly and encapsulation method comprises the following steps:
constructing multiple types of data processing services under each data processing scene, and constructing multiple data processing service versions by each type of data processing service so as to adapt to the changes of different running environments and business scenes;
Constructing a plurality of types of data processing technology components under each type of data processing service, wherein each type of data processing technology components is loaded and called by the data processing service through a reflection technology according to a unified interface standard; each type of data processing technology component constructs a plurality of data processing technology component versions to adapt to the change of the data processing service;
Constructing a multi-class data processing algorithm plug-in under each class of data processing technology assembly, loading and calling each class of data processing algorithm plug-in by the data processing technology assembly according to a unified interface standard through a reflection technology, and according to different calling modes, the data processing technology assembly comprises two types: the first is internal loading calling, which requires the realization technology of the data processing algorithm plug-in to be consistent with the data processing technology component, and the data processing technology component calls the data processing algorithm plug-in through an internal loading calling method; the second is http call, which is used for the data processing technical component to call the data processing algorithm plug-in through the http interface; each class of data processing algorithm plugins builds multiple versions of the data processing algorithm plugins to satisfy the extended variations of the data processing logic.
Step 4, establishing a data processing unit dependency relationship table according to the association between the data source objects and the data storage target objects among the data processing units; the data processing unit dependency relationship table is used for storing the dependency degree of each data processing unit and the dependency relationship between the data processing unit and other data processing units, and marking the execution state of each data processing unit, wherein the execution state comprises the following steps: an unexecuted state and an execution completed state;
Specifically, for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, the calculation method of the dependency and the dependency relationship is as follows:
For the jth data processing unit in other n-1 data processing units, j is not equal to i, j is not equal to 1,2,3, …, n, if a certain data object serving as a data storage target object exists in the jth data processing unit and is the same as a certain data object serving as a data source object of the ith data processing unit, the jth data processing unit is depended on, and the dependence of the ith data processing unit is increased by 1; sequentially comparing and identifying other n-1 data processing units, so as to obtain the dependence degree and the dependence relation of the ith data processing unit;
Step 5, reading the dependency relationship table of the data processing units, extracting each data processing unit marked as an unexecuted state and having a dependency degree of 0, and forming a data processing unit set;
Step 6, traversing the data processing unit set, asynchronously executing each data processing unit in the data processing unit set, after the execution of the data processing unit is completed, storing an execution result to a corresponding data storage target object by the data processing unit, and updating an execution state mark of a data processing unit dependency relationship table to mark the data processing unit as an execution completion state;
In this step, the execution method of each data processing unit is as follows:
1) Reading a data source according to the configured data source object;
2) Calling data processing service through a service gateway route; the data processing service loads and calls the corresponding data processing technology component through a reflection technology; the data processing technology component loads and calls the corresponding data processing algorithm plug-in through a reflection technology;
The called data processing algorithm plug-in performs data processing on the data source, and returns a data processing intermediate result to the called data processing technology component; the called data processing technology component further performs data processing on the data processing intermediate result to obtain a data processing result, and returns the data processing result to the called data processing service; therefore, the called data processing service obtains a data processing result, namely the execution result;
step 7, reading the current data processing unit dependency relationship table, judging whether a data processing unit marked as an unexecuted state exists or not, and ending the flow if the data processing unit marked as the unexecuted state does not exist; if so, executing step 8;
step 8, updating the dependency degree and the dependency relation of the data processing unit dependency relation table to obtain an updated data processing unit dependency relation table, wherein the updating method comprises the following steps:
For the i th data processing unit, i=1,2,3 , …, n in n data processing units, after execution is completed, searching for an unexecuted state from the data processing unit dependency table and depending on each data processing unit of the i th data processing unit, and subtracting 1 from the searched dependency of the data processing unit;
and 9, returning to the step 5, and circularly executing.
For ease of understanding, examples are as follows:
For example:
data objects D1, D2, D3, D4, D5, and data processing units P1, P2, P3 are provided.
The data source of P1 is D1, and the data storage targets are D2 and D3;
The data source of P2 is D2, and the data storage target is D4;
the data sources of P3 are D3 and D4, and the data storage targets are D5.
Then:
For the data processing unit P1, the data source is D1, and D1 is not a data storage target of P2 and P3, so that the dependency of the data processing unit P1 is 0, and P1 does not depend on P2 and P3;
for the data processing unit P2, the data source is D2, and D2 is simultaneously used as the data storage target of P1, so that the dependency of the data processing unit P2 is 1, and the P2 depends on P1;
For the data processing unit P3, the data sources are D3 and D4, while D3 is simultaneously used as the data storage target of P1, and D4 is simultaneously used as the data storage target of P2, so that the dependency of the data processing unit P3 is 2, and the P3 depends on P1 and P2;
Thus, the data processing unit dependency table is shown in table 1, and the second row of the data processing unit dependency table describes the dependency and dependency of P1; third line, describing the dependency and dependency relationship of P2; third line, describing the dependency and dependency relationship of P3;
TABLE 1
P1 | P2 | P3 | Degree of dependence | Execution state | |
P1 | 0 | Unexecuted state | |||
P2 | √ | 1 | Unexecuted state | ||
P3 | √ | √ | 2 | Unexecuted state |
Therefore, initially, the dependency of P1 is 0, P1 is executed first, and after the execution of P1 is completed, the execution state of P1 is updated to generate table 2:
TABLE 2
P1 | P2 | P3 | Degree of dependence | Execution state | |
P1 | 0 | Execution completion status | |||
P2 | √ | 1 | Unexecuted state | ||
P3 | √ | √ | 2 | Unexecuted state |
At this time, since there are P2 and P3 in the unexecuted state, it is necessary to update the dependency and the dependency relationship of table 2, and the update method is:
After P1 execution is completed, since P2 depends on P1, the dependency of P2 is reduced by 1; since P3 depends on P1, the dependency of P3 is reduced by 1, thereby generating table 3:
TABLE 3 Table 3
P1 | P2 | P3 | Degree of dependence | Execution state | |
P1 | 0 | Execution completion status | |||
P2 | √ | 0 | Unexecuted state | ||
P3 | √ | √ | 1 | Unexecuted state |
Then, according to table 3, P2 is performed, and the next cycle is entered.
The key technology of the invention is as follows:
In the invention, a data processing technology assembly is constructed aiming at the technology implementation in a single data processing scene, different big data technology implementations are packaged in the data processing technology assembly according to the data processing requirement, and the association between the data processing service and the data processing technology assembly implementation is shielded. The data processing technical components are loaded and called by the data processing service through the reflection technology according to the unified interface standard, and the same class of data processing service needs to define the unified component interface standard, so that the multiplexing of the data processing technical components is convenient. Multiple data processing technology components can be built under each data processing service, and each data processing technology component can build multiple versions to accommodate changes in data processing services and upgrades to large data technologies.
In the invention, a data processing algorithm plug-in is constructed aiming at complex logic units in a data processing scene. The data processing algorithm plug-ins are loaded and called by the data processing technology assembly through a reflection technology according to a unified interface standard, and the data processing algorithm plug-ins are divided into two types according to different calling modes: the first is internal loading call, which requires the realization technology of the data processing algorithm plug-in to be consistent with the called data processing technology component; the second is http call, which is used for integrating data processing algorithm service and is realized through http interface call. Multiple data processing algorithm plug-ins may be built under each data processing technology component, and each data processing algorithm plug-in may build multiple versions to satisfy the extended variations of the data processing logic.
In the invention, a single data processing scene is constructed as an independent data processing unit, and a user configures a data processing unit description model according to actual needs, as shown in fig. 2, as a specific implementation manner, the data processing unit description model is composed of three parts of a data processing task, a data source object and a data storage target object:
Data processing tasks: data processing services, data processing technology components and data processing algorithm plug-ins that perform data processing task calls, control parameters, and the like are described.
Data source object: the data volume input by the data processing task can be multiple.
Data storage target object: the target for storing the execution results of the data processing task can be a plurality of targets.
It should be noted that, the system registers data entities with different sources and different structures as a unified data object.
The data processing units are connected and combined according to service requirements, and are arranged into a step-by-step data processing flow, so that an integrated complex data processing scene comprising data acquisition, cleaning, convergence, index aggregation, association analysis, intelligent analysis and the like is realized.
The data processing units are connected through input and output data objects to establish a data processing unit dependency relationship table. Each data processing unit is executed asynchronously through the data processing unit dependency table.
The invention also provides a system for realizing the assembled integrated big data processing method, which comprises the following steps:
The first defining module is used for predefining a data processing scene, a data processing service, a data processing technical component and a data processing algorithm plug-in; the data processing service is provided with a service calling interface for calling the data processing service; the data processing technology component is provided with a component calling interface used for calling the data processing technology component; the data processing algorithm plug-in is provided with a plug-in calling interface and is used for calling the data processing algorithm plug-in;
A second definition module for predefining the data object; specifically, uniformly registering a data source object and a data storage target object as data objects, wherein each data object has a unique data object identifier;
The dividing module is used for loading a data processing flow model which needs to be subjected to data processing; dividing the data processing flow model into n data processing scenes;
The data processing unit generating module is used for generating data processing units corresponding to each data processing scene, and the specific generating method comprises the following steps:
1) Determining a data object which is a data source of the data processing unit from defined data objects, namely a data source object, and a data object which is a storage target as a result of execution of the data processing unit, namely a data storage target object;
2) Screening out required data processing services, data processing technical components and data processing algorithm plug-ins according to the data processing scene requirements, assembling and packaging according to the calling relation among the data processing services, the data processing technical components and the data processing algorithm plug-ins, and associating data source objects and data storage target objects to obtain the configured data processing unit;
The data processing unit dependency relationship table establishing module is used for establishing a data processing unit dependency relationship table according to the association between the data source objects and the data storage target objects among the data processing units; the data processing unit dependency relationship table is used for storing the dependency degree of each data processing unit and the dependency relationship between the data processing unit and other data processing units, and marking the execution state of each data processing unit, wherein the execution state comprises the following steps: an unexecuted state and an execution completed state;
Specifically, for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, the calculation method of the dependency and the dependency relationship is as follows:
For the jth data processing unit in other n-1 data processing units, j is not equal to i, j is not equal to 1,2,3, …, n, if a certain data object serving as a data storage target object exists in the jth data processing unit and is the same as a certain data object serving as a data source object of the ith data processing unit, the jth data processing unit is depended on, and the dependence of the ith data processing unit is increased by 1; sequentially comparing and identifying other n-1 data processing units, so as to obtain the dependence degree and the dependence relation of the ith data processing unit;
A data processing unit execution module comprising:
Step 1, reading a data processing unit dependency relationship table, extracting each data processing unit marked as an unexecuted state and having a dependency degree of 0, and forming a data processing unit set;
Step 2, traversing the data processing unit set, asynchronously executing each data processing unit in the data processing unit set, after the execution of the data processing unit is completed, storing an execution result to a corresponding data storage target object by the data processing unit, and updating an execution state mark of a data processing unit dependency relationship table to mark the data processing unit as an execution completion state;
In this step, the execution method of each data processing unit is as follows:
1) Reading a data source according to the configured data source object;
2) Calling data processing service through a service gateway route; the data processing service loads and calls the corresponding data processing technology component through a reflection technology; the data processing technology component loads and calls the corresponding data processing algorithm plug-in through a reflection technology;
The called data processing algorithm plug-in performs data processing on the data source, and returns a data processing intermediate result to the called data processing technology component; the called data processing technology component further performs data processing on the data processing intermediate result to obtain a data processing result, and returns the data processing result to the called data processing service; therefore, the called data processing service obtains a data processing result, namely the execution result;
step 3, reading the current data processing unit dependency relationship table, judging whether a data processing unit marked as an unexecuted state exists or not, and ending the flow if the data processing unit marked as the unexecuted state does not exist; if so, executing the step 4;
And 4, updating the dependency degree and the dependency relation of the data processing unit dependency relation table to obtain an updated data processing unit dependency relation table, wherein the updating method comprises the following steps of:
for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, searching for an unexecuted state from the data processing unit dependency table and depending on each data processing unit of the ith data processing unit after execution thereof is completed, and subtracting 1 from the searched dependency of the data processing unit;
And 5, returning to the step 1, and circularly executing.
According to the invention, an execution example of the data processing unit is automatically constructed according to the description model information of the data processing unit, and the configured data processing service interface is called for execution. The data processing technology assembly is loaded and called in the data processing service through the reflection technology, and the data processing algorithm plug-in is loaded and called by the data processing technology assembly, so that the flexible plug-in of the data processing service, the data processing technology assembly and the data processing algorithm plug-in is realized, and the high scalability of the system is ensured.
The invention provides a big data processing system and method which are flexibly assembled and constructed through scenes, technologies and algorithms, realizes the integrated construction of fusion application and complex application scenes of different technologies in big data processing, and effectively improves the system development efficiency.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.
Claims (6)
1. An assembled integrated big data processing method is characterized by comprising the following steps:
Step 1, predefining a data processing scene, a data processing service, a data processing technical component and a data processing algorithm plug-in; the data processing service is provided with a service calling interface for calling the data processing service; the data processing technology component is provided with a component calling interface used for calling the data processing technology component; the data processing algorithm plug-in is provided with a plug-in calling interface and is used for calling the data processing algorithm plug-in;
step 2, predefining a data object; specifically, uniformly registering a data source object and a data storage target object as data objects, wherein each data object has a unique data object identifier;
Step 3, loading a data processing flow model which needs to be subjected to data processing; dividing the data processing flow model into n data processing scenes;
The data processing units corresponding to each data processing scene are generated, and the specific generation method is as follows:
1) Determining a data object serving as a data source of the data processing unit from the data objects in the step 2, wherein the data object is called a data source object, and the data object serving as a storage target of a result after the data processing unit executes is called a data storage target object;
2) Screening out required data processing services, data processing technical components and data processing algorithm plug-ins according to the data processing scene requirements, assembling and packaging according to the calling relation among the data processing services, the data processing technical components and the data processing algorithm plug-ins, and associating data source objects and data storage target objects to obtain the configured data processing unit;
Step 4, establishing a data processing unit dependency relationship table according to the association between the data source objects and the data storage target objects among the data processing units; the data processing unit dependency relationship table is used for storing the dependency degree of each data processing unit and the dependency relationship between the data processing unit and other data processing units, and marking the execution state of each data processing unit, wherein the execution state comprises the following steps: an unexecuted state and an execution completed state;
Specifically, for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, the calculation method of the dependency and the dependency relationship is as follows:
For the jth data processing unit in other n-1 data processing units, j is not equal to i, j is not equal to 1,2,3, …, n, if a certain data object serving as a data storage target object exists in the jth data processing unit and is the same as a certain data object serving as a data source object of the ith data processing unit, the jth data processing unit is depended on, and the dependence of the ith data processing unit is increased by 1; sequentially comparing and identifying other n-1 data processing units, so as to obtain the dependence degree and the dependence relation of the ith data processing unit;
Step 5, reading the dependency relationship table of the data processing units, extracting each data processing unit marked as an unexecuted state and having a dependency degree of 0, and forming a data processing unit set;
Step 6, traversing the data processing unit set, asynchronously executing each data processing unit in the data processing unit set, after the execution of the data processing unit is completed, storing an execution result to a corresponding data storage target object by the data processing unit, and updating an execution state mark of a data processing unit dependency relationship table to mark the data processing unit as an execution completion state;
In this step, the execution method of each data processing unit is as follows:
1) Reading a data source according to the configured data source object;
2) Calling data processing service through a service gateway route; the data processing service loads and calls the corresponding data processing technology component through a reflection technology; the data processing technology component loads and calls the corresponding data processing algorithm plug-in through a reflection technology;
The called data processing algorithm plug-in performs data processing on the data source, and returns a data processing intermediate result to the called data processing technology component; the called data processing technology component further performs data processing on the data processing intermediate result to obtain a data processing result, and returns the data processing result to the called data processing service; therefore, the called data processing service obtains a data processing result, namely the execution result;
step 7, reading the current data processing unit dependency relationship table, judging whether a data processing unit marked as an unexecuted state exists or not, and ending the flow if the data processing unit marked as the unexecuted state does not exist; if so, executing step 8;
step 8, updating the dependency of the data processing unit dependency table to obtain an updated data processing unit dependency table, wherein the updating method comprises the following steps:
for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, searching for an unexecuted state from the data processing unit dependency table and depending on each data processing unit of the ith data processing unit after execution thereof is completed, and subtracting 1 from the searched dependency of the data processing unit;
and 9, returning to the step 5, and circularly executing.
2. An assemblable integrated big data processing method according to claim 1, wherein each class of data processing services configures a plurality of versions of data processing services; each type of data processing technology component configures multiple versions of the data processing technology component; each class of data processing algorithm plugins configures multiple versions of the data processing algorithm plugins.
3. The method for processing the assembled integrated big data according to claim 1, wherein the data processing scene comprises a data acquisition scene, a data synchronization scene, a data aggregation scene, a data cleaning scene and a data analysis scene.
4. A method of mountable integrated big data processing according to claim 3, wherein the data acquisition scenario is: realizing the data acquisition and storage of the data source to a designated data storage target object;
The data synchronization scene is: the data source is synchronized among different data storage target objects;
the data aggregation scene: constructing dimensions for the basic data, and constructing a business width table through dimension association;
The data cleaning scene is as follows: realizing the cleaning treatment and standardization treatment of dirty data;
The data analysis scene is: and realizing data mining analysis.
5. The method for processing the assembled integrated big data according to claim 1, wherein the assembling and packaging method is as follows:
constructing multiple types of data processing services under each data processing scene, and constructing multiple data processing service versions by each type of data processing service so as to adapt to the changes of different running environments and business scenes;
Constructing a plurality of types of data processing technology components under each type of data processing service, wherein each type of data processing technology components is loaded and called by the data processing service through a reflection technology according to a unified interface standard; each type of data processing technology component constructs a plurality of data processing technology component versions to adapt to the change of the data processing service;
Constructing a multi-class data processing algorithm plug-in under each class of data processing technology assembly, loading and calling each class of data processing algorithm plug-in by the data processing technology assembly according to a unified interface standard through a reflection technology, and according to different calling modes, the data processing technology assembly comprises two types: the first is internal loading calling, which requires the realization technology of the data processing algorithm plug-in to be consistent with the data processing technology component, and the data processing technology component calls the data processing algorithm plug-in through an internal loading calling method; the second is http call, which is used for the data processing technical component to call the data processing algorithm plug-in through the http interface; each class of data processing algorithm plugins builds multiple versions of the data processing algorithm plugins to satisfy the extended variations of the data processing logic.
6. A system of an assemblable integrated big data processing method according to any one of claims 1 to 5, comprising:
The first defining module is used for predefining a data processing scene, a data processing service, a data processing technical component and a data processing algorithm plug-in; the data processing service is provided with a service calling interface for calling the data processing service; the data processing technology component is provided with a component calling interface used for calling the data processing technology component; the data processing algorithm plug-in is provided with a plug-in calling interface and is used for calling the data processing algorithm plug-in;
A second definition module for predefining the data object; specifically, uniformly registering a data source object and a data storage target object as data objects, wherein each data object has a unique data object identifier;
The dividing module is used for loading a data processing flow model which needs to be subjected to data processing; dividing the data processing flow model into n data processing scenes;
The data processing unit generating module is used for generating data processing units corresponding to each data processing scene, and the specific generating method comprises the following steps:
1) Determining a data object which is a data source of the data processing unit from defined data objects, namely a data source object, and a data object which is a storage target as a result of execution of the data processing unit, namely a data storage target object;
2) Screening out required data processing services, data processing technical components and data processing algorithm plug-ins according to the data processing scene requirements, assembling and packaging according to the calling relation among the data processing services, the data processing technical components and the data processing algorithm plug-ins, and associating data source objects and data storage target objects to obtain the configured data processing unit;
The data processing unit dependency relationship table establishing module is used for establishing a data processing unit dependency relationship table according to the association between the data source objects and the data storage target objects among the data processing units; the data processing unit dependency relationship table is used for storing the dependency degree of each data processing unit and the dependency relationship between the data processing unit and other data processing units, and marking the execution state of each data processing unit, wherein the execution state comprises the following steps: an unexecuted state and an execution completed state;
Specifically, for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, the calculation method of the dependency and the dependency relationship is as follows:
For the jth data processing unit in other n-1 data processing units, j is not equal to i, j is not equal to 1,2,3, …, n, if a certain data object serving as a data storage target object exists in the jth data processing unit and is the same as a certain data object serving as a data source object of the ith data processing unit, the jth data processing unit is depended on, and the dependence of the ith data processing unit is increased by 1; sequentially comparing and identifying other n-1 data processing units, so as to obtain the dependence degree and the dependence relation of the ith data processing unit;
A data processing unit execution module comprising:
Step 1, reading a data processing unit dependency relationship table, extracting each data processing unit marked as an unexecuted state and having a dependency degree of 0, and forming a data processing unit set;
Step 2, traversing the data processing unit set, asynchronously executing each data processing unit in the data processing unit set, after the execution of the data processing unit is completed, storing an execution result to a corresponding data storage target object by the data processing unit, and updating an execution state mark of a data processing unit dependency relationship table to mark the data processing unit as an execution completion state;
In this step, the execution method of each data processing unit is as follows:
1) Reading a data source according to the configured data source object;
2) Calling data processing service through a service gateway route; the data processing service loads and calls the corresponding data processing technology component through a reflection technology; the data processing technology component loads and calls the corresponding data processing algorithm plug-in through a reflection technology;
The called data processing algorithm plug-in performs data processing on the data source, and returns a data processing intermediate result to the called data processing technology component; the called data processing technology component further performs data processing on the data processing intermediate result to obtain a data processing result, and returns the data processing result to the called data processing service; therefore, the called data processing service obtains a data processing result, namely the execution result;
step 3, reading the current data processing unit dependency relationship table, judging whether a data processing unit marked as an unexecuted state exists or not, and ending the flow if the data processing unit marked as the unexecuted state does not exist; if so, executing the step 4;
and 4, updating the dependency degree of the data processing unit dependency relation table to obtain an updated data processing unit dependency relation table, wherein the updating method comprises the following steps of:
for the ith data processing unit in the n data processing units, i=1, 2,3, …, n, searching for an unexecuted state from the data processing unit dependency table and depending on each data processing unit of the ith data processing unit after execution thereof is completed, and subtracting 1 from the searched dependency of the data processing unit;
And 5, returning to the step 1, and circularly executing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310803713.0A CN116755804B (en) | 2023-07-03 | 2023-07-03 | Assembled integrated big data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310803713.0A CN116755804B (en) | 2023-07-03 | 2023-07-03 | Assembled integrated big data processing method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116755804A CN116755804A (en) | 2023-09-15 |
CN116755804B true CN116755804B (en) | 2024-04-26 |
Family
ID=87960811
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310803713.0A Active CN116755804B (en) | 2023-07-03 | 2023-07-03 | Assembled integrated big data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116755804B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007242051A (en) * | 2007-05-21 | 2007-09-20 | Nomura Research Institute Ltd | Device for mounting/executing business logic program |
WO2016036813A2 (en) * | 2014-09-02 | 2016-03-10 | Ab Initio Technology Llc | Controlling data processing tasks |
CN110532038A (en) * | 2019-08-19 | 2019-12-03 | 杭州趣链科技有限公司 | A kind of parallel execution method based on Java intelligence contract |
CN110597572A (en) * | 2018-06-13 | 2019-12-20 | 中移(苏州)软件技术有限公司 | Service calling relation analysis method and computer system |
CN111651451A (en) * | 2020-04-25 | 2020-09-11 | 复旦大学 | Scene-driven single system micro-service splitting method |
CN111754073A (en) * | 2020-05-19 | 2020-10-09 | 北京吉威空间信息股份有限公司 | Centralized processing and distributed operation framework construction method for spatial data service |
CN112379884A (en) * | 2020-11-13 | 2021-02-19 | 李斌 | Spark and parallel memory computing-based process engine implementation method and system |
CN114675943A (en) * | 2020-12-24 | 2022-06-28 | 珠海市魅族科技有限公司 | Multi-program cooperation method, system, device and medium based on different scenes |
CN115794262A (en) * | 2022-12-07 | 2023-03-14 | 百度(中国)有限公司 | Task processing method, device, equipment, storage medium and program product |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7774751B2 (en) * | 2003-12-26 | 2010-08-10 | Yefim Zhuk | Knowledge-driven architecture |
US20100050156A1 (en) * | 2008-08-20 | 2010-02-25 | International Business Machines Corporation | Using build history information to optimize a software build process |
US9773070B2 (en) * | 2014-06-30 | 2017-09-26 | Microsoft Technology Licensing, Llc | Compound transformation chain application across multiple devices |
-
2023
- 2023-07-03 CN CN202310803713.0A patent/CN116755804B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007242051A (en) * | 2007-05-21 | 2007-09-20 | Nomura Research Institute Ltd | Device for mounting/executing business logic program |
WO2016036813A2 (en) * | 2014-09-02 | 2016-03-10 | Ab Initio Technology Llc | Controlling data processing tasks |
CN110597572A (en) * | 2018-06-13 | 2019-12-20 | 中移(苏州)软件技术有限公司 | Service calling relation analysis method and computer system |
CN110532038A (en) * | 2019-08-19 | 2019-12-03 | 杭州趣链科技有限公司 | A kind of parallel execution method based on Java intelligence contract |
CN111651451A (en) * | 2020-04-25 | 2020-09-11 | 复旦大学 | Scene-driven single system micro-service splitting method |
CN111754073A (en) * | 2020-05-19 | 2020-10-09 | 北京吉威空间信息股份有限公司 | Centralized processing and distributed operation framework construction method for spatial data service |
CN112379884A (en) * | 2020-11-13 | 2021-02-19 | 李斌 | Spark and parallel memory computing-based process engine implementation method and system |
CN114675943A (en) * | 2020-12-24 | 2022-06-28 | 珠海市魅族科技有限公司 | Multi-program cooperation method, system, device and medium based on different scenes |
CN115794262A (en) * | 2022-12-07 | 2023-03-14 | 百度(中国)有限公司 | Task processing method, device, equipment, storage medium and program product |
Non-Patent Citations (1)
Title |
---|
基于工作流引擎高校内控管理系统的设计与实现;于爽;江苏科技大学;20211231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116755804A (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | A multilevel index model to expedite web service discovery and composition in large-scale service repositories | |
Silva et al. | Exploiting common subexpressions for cloud query processing | |
Fard et al. | Towards efficient query processing on massive time-evolving graphs | |
US20160239544A1 (en) | Collaborative planning for accelerating analytic queries | |
Wang et al. | BENU: Distributed subgraph enumeration with backtracking-based framework | |
Reza et al. | Prunejuice: pruning trillion-edge graphs to a precise pattern-matching solution | |
CN112379884A (en) | Spark and parallel memory computing-based process engine implementation method and system | |
CN104615703A (en) | RDF data distributed parallel inference method combined with Rete algorithm | |
Oliveira et al. | Rigorous development of component-based systems using component metadata and patterns | |
Singh et al. | A data structure perspective to the RDD-based Apriori algorithm on Spark | |
Sampath et al. | An efficient weighted rule mining for web logs using systolic tree | |
Gombos et al. | Spar (k) ql: SPARQL evaluation method on Spark GraphX | |
CN116775041B (en) | Real-time decision engine implementation method based on stream calculation and RETE algorithm | |
CN116755804B (en) | Assembled integrated big data processing method and system | |
CN105701605A (en) | Waveform list management module applied to integrated communication navigation identification system | |
CN114138811A (en) | Column calculation optimization method based on Spark SQL | |
Tehreem et al. | Parallel architecture for implementation of frequent itemset mining using FP-growth | |
Abdolazimi et al. | Connected components of big graphs in fixed mapreduce rounds | |
CN110851178B (en) | Inter-process program static analysis method based on distributed graph reachable computation | |
CN111198766B (en) | Database access operation deployment method, database access method and device | |
Fegaras | Supporting bulk synchronous parallelism in map-reduce queries | |
Cabodi et al. | A graph‐labeling approach for efficient cone‐of‐influence computation in model‐checking problems with multiple properties | |
CN109918410B (en) | Spark platform based distributed big data function dependency discovery method | |
Lin et al. | Double resource optimization for a robust computer network subject to a transmission budget | |
Elmaghraoui et al. | Dynamic web service composition using AND/OR directed graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |