CN107038260A - A kind of efficient parallel loading method for keeping titan Real-time Data Uniforms - Google Patents
A kind of efficient parallel loading method for keeping titan Real-time Data Uniforms Download PDFInfo
- Publication number
- CN107038260A CN107038260A CN201710390469.4A CN201710390469A CN107038260A CN 107038260 A CN107038260 A CN 107038260A CN 201710390469 A CN201710390469 A CN 201710390469A CN 107038260 A CN107038260 A CN 107038260A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- titan
- pieceofdata
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of efficient parallel loading method for keeping titan Real-time Data Uniforms, belong to big data process field;First, titan is divided into the module of 7 concurrent workings, cleaning rule management module real-time update filtering rule;Data reception module receives pieceOfData and is put into queue1;Data cleansing module filters qualified data and is put into queue2;ID modular converters are interacted with high speed index module, judge two points in current pieceOfData and titan ID corresponding relation whether there is with chart database;If it is, ID attributes inside titan and ID value substitution points are saved in pieceOfDataT, it is put into queue4;Otherwise, the point not loaded is put into HashSet, and corresponding pieceOfData is put into queue3;PieceOfDataT is loaded into titan by remaining data load-on module multi-threaded parallel;Point load-on module is responsible for HashSet midpoints adding titan, will put the corresponding relation addition high speed index module with titan ID.Each module of the invention is alone or interaction completes partial function, so as to realize the lifting of loading efficiency on the whole.
Description
Technical field
The invention belongs to big data process field, it is related to a kind of chart database real time data pretreatment loading of highly effective and safe
Method, specifically a kind of efficient parallel loading method for keeping titan Real-time Data Uniforms.
Background technology
With the continuous improvement continued to develop with the level of informatization of computer technology, data volume is being increased rapidly, data
Structure is also gradually being complicated, and traditional relevant database is difficult with many scenes, therefore various non-passes of being born
It is type database.
Chart database is one kind in non-relational database, the various relational network data of storage is good at, in numerous figure numbers
According in storehouse, titan is as very outstanding handy distributed chart database, with high scalability, by expanding cluster
Size linearly improves the upper limit of figure storage, while the memory scan of super big figure can be supported;Therefore apply in many scenes
Under;But in loading processing real time data, in order to ensure the uniformity of data, titan can only carry out single thread loading, in real time
The inefficiency of data loading, with significant limitations, it is impossible to meet the loading demand of big flow real time data.
The content of the invention
For in the prior art, chart database titan when handling big flow real time data the problem of poorly efficient insecurity,
The invention provides a kind of colleges and universities' loaded in parallel method for keeping titan Real-time Data Uniforms.
Comprise the following steps that:
Step 1: chart database titan is divided into 7 modules, and 7 modular concurrent operations;
7 modules include:Data reception module, cleaning rule management module, data cleansing module, ID modular converters are high
Fast index module, point load-on module and remaining data load-on module;
Data reception module, which is responsible for reception, needs data to be processed, and is put into bounded queue;
Cleaning rule management module realizes that the dynamic of filtering rule updates by monitoring rules file;
Data cleansing module is by unwanted data in the given rule-based filtering bounded queue of cleaning rule management module;
ID modular converters replace with the point in the data after cleaning the ID of corresponding points in chart database.
High speed index module is responsible for accelerating ID conversion rate.
Point load-on module, is responsible for the point being not present in during load id conversion in chart database;And after loading is complete by point
And its ID corresponding relations are added to high speed index module.
Remaining data load-on module, the loading velocity of diagram data is substantially improved by loaded in parallel.
Step 2: the multithreading of data reception module concurrent working simultaneously, each thread loops are literary from message queue or CSV
The data source such as part or message queue obtains data, is parsed into a plurality of pieceOfData data, is put into bounded queue queue1.
Relation of the pieceOfData data between two points, two points, and point are constituted with the attribute in relation;
Bounded queue queue1 is used to deposit the data obtained from data source;
Step 3: regular configuration file is read in the timing of cleaning rule management module, or receive client request reading rule
Configuration file, the filtering rule of dynamic renewal in real time;
Step 4: data cleansing module multi-threaded parallel works, each thread loops are obtained from bounded queue queue1 successively
A pieceOfData data are taken, are judged using cleaning rule, if meeting filter condition, directly abandons, otherwise, puts
Enter bounded queue queue2.
Queue2 is used to deposit the data after filtering in bounded queue queue1;
Step 5: ID modular converters multi-threaded parallel works, each thread loops take out clearly from bounded queue queue2
The pieceOfData data after filtering are washed to be handled;
Concretely comprise the following steps:
Step 501, judge the corresponding relation between ID inside two points in current pieceOfData data and titan
Whether all it is present in high speed index module;If it is, into step 502, otherwise, into step 503;
Step 502, ID modular converters take out corresponding relation from high speed index module, corresponding with the replacement of ID values with ID attributes
PieceOfData data in point, and be saved in pieceOfDataT data, pieceOfDataT data be put into
Boundary's queue queue4;
What is preserved in pieceOfDataT data is that point in pieceOfData data is replaced by corresponding ID attributes and ID values
PieceOfData after alternatively;
Queue4 is used to deposit pieceOfDataT data;
Corresponding relation between the point of at least one in step 503, current pieceOfData data and titan inside ID is not
It is loaded into high speed index module, the point not being loaded is put into HashSet by ID modular converters, and should
PieceOfData data are put into bounded queue queue3;
Queue3 is used to deposit the pieceOfData data selected from bounded queue queue2, the pieceOfData numbers
Corresponding relation between at least one point and titan inside ID is not loaded into high speed index module.
Step 6: the concurrent working simultaneously of the multithreading of remaining data load-on module, each thread loops are from bounded queue
PieceOfDataT data are obtained in queue4, and are carried in titan databases;
Step 7: point load-on module is interacted with high speed index module, after termination condition is met, terminate all threads;
Comprise the following steps that:
Step 701, judge whether to meet termination condition, if it is, all threads terminate;Otherwise, into step 702;
Step 702, when judging that data are alreadyd exceed in the whether full HashSet apart from last time loading of bounded queue queue3
Between threshold value t, if it is, perform step 703, otherwise, dormancy time t1;Return to step 701 continues;
Threshold value t is that system initialization is participated in the experiment, and is set according to actual conditions;
Step 703, the point put in each thread loading HashSet of load-on module, and by ID inside the point and titan it
Between corresponding relation add high speed index module in;
Step 704, point load-on module are reset to HashSet, and record current time is data in loading HashSet
Time;
Step 705, the pieceOfData data in bounded queue queue3 are all put into bounded queue queue2,
Empty bounded queue queue3;Return to step 701.
The advantage of the invention is that:
1), a kind of efficient parallel loading method for keeping titan Real-time Data Uniforms, can greatly improve titan real
When data loading performance, loading velocity is lifted on 20 times.
2), a kind of efficient parallel loading method for keeping titan Real-time Data Uniforms, is the real-time number of highly effective and safe
Data preprocess loading method;Data loading efficiency can be greatly improved on the premise of data consistency is kept, and can real time modifying
Interpolation data filtering rule.
Brief description of the drawings
Fig. 1 is the structure chart that chart database titan of the present invention is divided into 7 modules;
Fig. 2 is a kind of efficient parallel loading method flow chart for keeping titan Real-time Data Uniforms of the present invention.
Specific embodiment
The specific implementation method to the present invention is described in detail below in conjunction with the accompanying drawings.
The present invention in order to ensure data consistency on the premise of be greatly enhanced the loading performance of titan real time datas, carry
A kind of colleges and universities' loaded in parallel method for keeping titan Real-time Data Uniforms is gone out;Generally include three parts:Real time data is clear
Wash, storage control and the processing of new point;
Regulation management thread is responsible for dynamic in real time and updates filtering rule;Main thread receives pieceOfData data, has been put into
In boundary's queue queue1;Data cleansing module filters out underproof data according to cleaning rule, is put into bounded queue queue2
In;ID modular converters are fetched evidence from bounded queue queue2, are interacted with high speed index module;Judge current pieceOfData
Corresponding relation inside two points in data and titan between ID whether there is with chart database;If it is, from index
ID attributes and ID value substitution points inside the corresponding titan of off-take point, and be saved in pieceOfDataT data, it is put into bounded
Queue queue4;Otherwise, the point not being loaded is put into HashSet, and the corresponding pieceOfData data has been put into
In boundary's queue queue3;Remaining data load-on module obtains pieceOfDataT data, multithreading from bounded queue queue4
Loaded in parallel is in titan;
Point Loading Control thread judges whether data cleansing terminates, if it has not ended, continuing to judge bounded queue
Whether queue3 is full, if less than thread dormancy waits bounded queue queue3 to expire for a period of time, and otherwise, multithreading adds
The point in HashSet is carried, and the corresponding relation between ID inside the point and titan is added in high speed index module;Then, point
Load-on module resets HashSet, and the pieceOfData data in bounded queue queue3 are all put into bounded queue queue2
In, empty bounded queue queue3.
Specific steps are as shown in Fig. 2 as follows:
Step 1: chart database titan is divided into 7 modules, and 7 modular concurrent operations;
As shown in figure 1,7 modules include:Data reception module, cleaning rule management module, data cleansing module, ID turns
Change the mold block, high speed index module, point load-on module and remaining data load-on module;Each module is alone or interaction completes part work(
Can, so as to realize the lifting of loading efficiency on the whole.
First module data receiving module, realizing to receive from the place such as message queue or csv file needs what is be processed
Data, and be put into bounded queue.
Second module data cleaning module, is responsible for filtering unwanted data according to given rule;Given rule includes
Accurate matching, is obscured or canonical matching.
3rd module cleaning rule management module, the dynamic for realizing filtering rule by monitoring rules file updates.
Filtering rule file is Json formatted files, and concrete structure is shown in annex 1.
Filtering rule file:
4th module I D modular converter, is responsible for replacing with the point in data the ID of corresponding points in chart database.
5th module high speed index module, structure is key-value types;It is responsible for accelerating the conversion of ID in the 4th module
Speed.
6th module point load-on module, is responsible for the point being not present in during loading the 4th module I D conversions in chart database;
And point and its ID corresponding relations are added to high speed index module after loading is complete.
7th module remaining data load-on module, the loading velocity of diagram data is substantially improved by loaded in parallel.
Step 2: the multithreading of data reception module concurrent working simultaneously, each thread loops are literary from message queue or CSV
The data sources such as part obtain diagram data, are parsed into a plurality of pieceOfData data, are put into bounded queue queue1.
Diagram data is the various topological diagram datas by putting and side is constituted;
Relation of the pieceOfData data between two points, two points, and point and the attribute of relation are constituted;Putting is
One is used for the key-value pair of unique mark specified point, such as uid=9867;
Bounded queue queue1 is used to deposit the data obtained from data source;
Step 3: regular configuration file is read in the timing of cleaning rule management module, or receive client request reading rule
Configuration file, the filtering rule of dynamic renewal in real time;
Step 4: data cleansing module multi-threaded parallel works, each thread loops are obtained from bounded queue queue1 successively
A pieceOfData data are taken, are judged using cleaning rule, if meeting filter condition, directly abandons, otherwise, puts
Enter bounded queue queue2.
Queue2 is used to deposit the data after filtering in bounded queue queue1;
Step 5: ID modular converters multi-threaded parallel works, each thread loops take out clearly from bounded queue queue2
The pieceOfData data after filter are washed, and judge two points in current pieceOfData data and ID inside titan
Between corresponding relation whether be all present in high speed index module;If it is, into step 6, otherwise, into step 8;
Step 6: ID modular converters take out corresponding relation from high speed index module, with ID attributes inside titan and ID values
The point in corresponding pieceOfData data is replaced, and is saved in pieceOfDataT data, bounded queue is put into
queue4;
What is preserved in pieceOfDataT data is that point in pieceOfData data is replaced by corresponding ID attributes and ID values
PieceOfData after alternatively;
Queue4 is used to deposit pieceOfDataT data;
Step 7: the concurrent working simultaneously of the multithreading of remaining data load-on module, each thread loops are from bounded queue
PieceOfDataT data are obtained in queue4, and are carried in titan databases, return to step five;
Step 8: corresponding relation in current pieceOfData data between at least one point and ID inside titan not by
It is loaded into high speed index module, the point not being loaded is put into HashSet by ID modular converters, and is somebody's turn to do corresponding
PieceOfData data are put into bounded queue queue3;
Queue3 is used to deposit the pieceOfData data selected from bounded queue queue2, the pieceOfData numbers
Corresponding relation between at least one point and titan inside ID is not loaded into high speed index module.
Step 9: judging whether the full or time reaches given threshold t to bounded queue queue3, if it is, performing step
Ten, otherwise, return to step five;
Threshold value t is that system initialization is participated in the experiment, and is set according to actual conditions;
When the full and time reaches that both given threshold t condition meets one of them to bounded queue queue3,
Continue into subsequent step;Conversely, when bounded queue queue3 less than and the time be not up to given threshold t when, current thread
Dormancy is carried out, the data not being loaded into bounded queue queue2 in high speed index module are waited, by ID modular converters by point
It is put into HashSet, and corresponding pieceOfData data is put into bounded queue queue3;Until bounded queue
The full or time reaches given threshold t to queue3;
Step 10: point load-on module judges whether data cleansing terminates, if it is, all threads terminate;Otherwise, into step
Rapid 11;
Step 11: the point in each thread loading HashSet of point load-on module, and by ID inside the point and titan
Between corresponding relation add high speed index module in;
Step 12: point load-on module is reset to HashSet, by the pieceOfData in bounded queue queue3
Data are all put into bounded queue queue2, empty bounded queue queue3;Return to step five.
It should be noted that and understand, in the feelings for not departing from the spirit and scope of the present invention required by appended claims
Under condition, various modifications and improvements can be made to the present invention of foregoing detailed description.It is therefore desirable to the model of the technical scheme of protection
Enclose and do not limited by given any specific exemplary teachings.
Claims (4)
1. a kind of colleges and universities' loaded in parallel method for keeping titan Real-time Data Uniforms, it is characterised in that specific steps are such as
Under:
Step 1: chart database titan is divided into 7 modules, and 7 modular concurrent operations;
7 modules include:Data reception module, cleaning rule management module, data cleansing module, ID modular converters, high speed rope
Draw module, point load-on module and remaining data load-on module;
Step 2: the concurrent working simultaneously of the multithreading of data reception module, each thread loops from message queue or csv file or
The data sources such as message queue obtain data, are parsed into a plurality of pieceOfData data, are put into bounded queue queue1;
Relation of the pieceOfData data between two points, two points, and point and the attribute of relation are constituted;
Step 3: regular configuration file is read in the timing of cleaning rule management module, or receive client request reading rule configuration
File, the filtering rule of dynamic renewal in real time;
Step 4: data cleansing module multi-threaded parallel works, each thread loops obtain one from bounded queue queue1 successively
Bar pieceOfData data, are judged using cleaning rule, if meeting filter condition, are directly abandoned, otherwise, have been put into
Boundary's queue queue2;
Step 5: ID modular converters multi-threaded parallel works, each thread loops take out from bounded queue queue2 and cleaned
PieceOfData data after filter are handled;
Step 6: the concurrent working simultaneously of the multithreading of remaining data load-on module, each thread loops are from bounded queue queue4
Middle acquisition pieceOfDataT data, and be carried in titan databases;
Step 7: point load-on module is interacted with high speed index module, after termination condition is met, terminate all threads.
2. a kind of colleges and universities' loaded in parallel method for keeping titan Real-time Data Uniforms as claimed in claim 1, its feature
It is, in the step one, data reception module, which is responsible for reception, needs data to be processed, and is put into bounded queue;
Cleaning rule management module realizes that the dynamic of filtering rule updates by monitoring rules file;
Data cleansing module is by unwanted data in the given rule-based filtering bounded queue of cleaning rule management module;
ID modular converters replace with the point in the data after cleaning the ID of corresponding points in chart database;
High speed index module is responsible for accelerating ID conversion rate;
Point load-on module, is responsible for the point being not present in during load id conversion in chart database;And after loading is complete will point and its
ID corresponding relations are added to high speed index module;
Remaining data load-on module, the loading velocity of diagram data is substantially improved by loaded in parallel.
3. a kind of colleges and universities' loaded in parallel method for keeping titan Real-time Data Uniforms as claimed in claim 1, its feature
It is, the step 5 is specially:
Step 501, judge whether is corresponding relation inside two points in current pieceOfData data and titan between ID
All it is present in high speed index module;If it is, into step 502, otherwise, into step 503;
Step 502, ID modular converters take out corresponding relation from high speed index module, corresponding with the replacement of ID values with ID attributes
Point in pieceOfData data, and be saved in pieceOfDataT data, pieceOfDataT data are put into bounded
Queue queue4;
What is preserved in pieceOfDataT data is that point in pieceOfData data is replaced it by corresponding ID attributes and ID values
PieceOfData afterwards;
Corresponding relation between the point of at least one in step 503, current pieceOfData data and titan inside ID is not added
It is downloaded in high speed index module, the point not being loaded is put into HashSet by ID modular converters, and by the pieceOfData numbers
According to being put into bounded queue queue3;
Queue3 is used to deposit in the pieceOfData data selected from bounded queue queue2, the pieceOfData data
Corresponding relation between at least one point and titan inside ID is not loaded into high speed index module.
4. a kind of colleges and universities' loaded in parallel method for keeping titan Real-time Data Uniforms as claimed in claim 1, its feature
It is, the step 7 is specially:
Step 701, judge whether to meet termination condition, if it is, all threads terminate;Otherwise, into step 702;
Step 702, judge that data already exceed time threshold in the whether full HashSet apart from last time loading of bounded queue queue3
Value t, if it is, performing step 703, otherwise, dormancy time t1;Return to step 701 continues;
Threshold value t is that system initialization is participated in the experiment, and is set according to actual conditions;
Step 703, the point put in each thread loading HashSet of load-on module, and by between ID inside the point and titan
Corresponding relation is added in high speed index module;
Step 704, point load-on module HashSet is reset, record current time for load HashSet in data when
Between;
Step 705, the pieceOfData data in bounded queue queue3 are all put into bounded queue queue2, emptied
Bounded queue queue3;Return to step 701.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710390469.4A CN107038260B (en) | 2017-05-27 | 2017-05-27 | Efficient parallel loading method capable of keeping titan real-time data consistency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710390469.4A CN107038260B (en) | 2017-05-27 | 2017-05-27 | Efficient parallel loading method capable of keeping titan real-time data consistency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107038260A true CN107038260A (en) | 2017-08-11 |
CN107038260B CN107038260B (en) | 2020-03-10 |
Family
ID=59539492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710390469.4A Active CN107038260B (en) | 2017-05-27 | 2017-05-27 | Efficient parallel loading method capable of keeping titan real-time data consistency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107038260B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189743A (en) * | 2018-06-26 | 2019-01-11 | 国家计算机网络与信息安全管理中心 | A kind of the super node identification filter method and system of the low consumption of resources towards the real-time diagram data of big flow |
CN112597145A (en) * | 2020-12-29 | 2021-04-02 | 恩亿科(北京)数据科技有限公司 | Real-time data cleaning method, system, electronic equipment and storage medium |
CN112685419A (en) * | 2020-12-31 | 2021-04-20 | 北京赛思信安技术股份有限公司 | Distributed efficient parallel loading method capable of keeping consistency of janusGraph data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103279546A (en) * | 2013-05-13 | 2013-09-04 | 清华大学 | Graph data query method |
WO2014130035A1 (en) * | 2013-02-21 | 2014-08-28 | Bluearc Uk Limited | Object-level replication of cloned objects in a data storage system |
CN106095977A (en) * | 2016-06-20 | 2016-11-09 | 环球大数据科技有限公司 | The distributed approach of a kind of data base and system |
CN106126583A (en) * | 2016-06-20 | 2016-11-16 | 环球大数据科技有限公司 | The collection group strong compatibility processing method of a kind of distributed chart database and system |
-
2017
- 2017-05-27 CN CN201710390469.4A patent/CN107038260B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014130035A1 (en) * | 2013-02-21 | 2014-08-28 | Bluearc Uk Limited | Object-level replication of cloned objects in a data storage system |
CN103279546A (en) * | 2013-05-13 | 2013-09-04 | 清华大学 | Graph data query method |
CN106095977A (en) * | 2016-06-20 | 2016-11-09 | 环球大数据科技有限公司 | The distributed approach of a kind of data base and system |
CN106126583A (en) * | 2016-06-20 | 2016-11-16 | 环球大数据科技有限公司 | The collection group strong compatibility processing method of a kind of distributed chart database and system |
Non-Patent Citations (1)
Title |
---|
黄权隆: "HybriG:一种高效处理大量重边的属性图存储架构", 《计算机学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189743A (en) * | 2018-06-26 | 2019-01-11 | 国家计算机网络与信息安全管理中心 | A kind of the super node identification filter method and system of the low consumption of resources towards the real-time diagram data of big flow |
CN109189743B (en) * | 2018-06-26 | 2021-09-28 | 国家计算机网络与信息安全管理中心 | Super node recognition filtering method and system with low resource consumption and oriented to large-flow real-time graph data |
CN112597145A (en) * | 2020-12-29 | 2021-04-02 | 恩亿科(北京)数据科技有限公司 | Real-time data cleaning method, system, electronic equipment and storage medium |
CN112685419A (en) * | 2020-12-31 | 2021-04-20 | 北京赛思信安技术股份有限公司 | Distributed efficient parallel loading method capable of keeping consistency of janusGraph data |
Also Published As
Publication number | Publication date |
---|---|
CN107038260B (en) | 2020-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107526645B (en) | A kind of communication optimization method and system | |
CN104317970B (en) | A kind of data stream type processing method based on data mart modeling center | |
CN114399227A (en) | Production scheduling method and device based on digital twins and computer equipment | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN105989129A (en) | Real-time data statistic method and device | |
CN107038260A (en) | A kind of efficient parallel loading method for keeping titan Real-time Data Uniforms | |
CN110222029A (en) | A kind of big data multidimensional analysis computational efficiency method for improving and system | |
CN111459646B (en) | Big data quality management task scheduling method based on pipeline model and task combination | |
CN104317942A (en) | Massive data comparison method and system based on hadoop cloud platform | |
CN108334557A (en) | A kind of aggregated data analysis method, device, storage medium and electronic equipment | |
CN108829740A (en) | Date storage method and device | |
CN107977504A (en) | A kind of asymmetric in-core fuel management computational methods, device and terminal device | |
CN110162736A (en) | Large Scale Sparse symmetrical linear equation group method for parallel processing based on elimination-tree | |
CN107436865A (en) | A kind of word alignment training method, machine translation method and system | |
CN104036141A (en) | Open computing language (OpenCL)-based red-black tree acceleration algorithm | |
CN104933110B (en) | A kind of data prefetching method based on MapReduce | |
CN112561902A (en) | Chip inverse reduction method and system based on deep learning | |
CN109062866B (en) | Solving method and system for upper triangular equation set of electric power system based on greedy layering | |
CN108985622B (en) | Power system sparse matrix parallel solving method and system based on DAG | |
CN106776810A (en) | The data handling system and method for a kind of big data | |
CN113661510A (en) | Non-linear programming model-based production planning system, production planning method, and computer-readable storage medium | |
CN107423028A (en) | A kind of parallel scheduling method of extensive flow | |
Shen et al. | Massive power device condition monitoring data feature extraction and clustering analysis using MapReduce and graph model | |
CN116644136A (en) | Data acquisition method, device, equipment and medium for increment and full data | |
CN102253861A (en) | Method for executing stepwise plug-in computation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |