CN112711659A - Model calculation method and device based on mass graph data - Google Patents

Model calculation method and device based on mass graph data Download PDF

Info

Publication number
CN112711659A
CN112711659A CN202011625560.8A CN202011625560A CN112711659A CN 112711659 A CN112711659 A CN 112711659A CN 202011625560 A CN202011625560 A CN 202011625560A CN 112711659 A CN112711659 A CN 112711659A
Authority
CN
China
Prior art keywords
data
file
graph
database
hdfs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011625560.8A
Other languages
Chinese (zh)
Other versions
CN112711659B (en
Inventor
顾凌云
郭志攀
王伟
李海全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Bingjian Information Technology Co ltd
Original Assignee
Nanjing Bingjian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Bingjian Information Technology Co ltd filed Critical Nanjing Bingjian Information Technology Co ltd
Priority to CN202011625560.8A priority Critical patent/CN112711659B/en
Publication of CN112711659A publication Critical patent/CN112711659A/en
Application granted granted Critical
Publication of CN112711659B publication Critical patent/CN112711659B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The model calculation method and device based on the massive graph data lead the graph data to be processed from the graph database JanusGraph into the hive database to obtain a data node list and a data relation list, determine each data node and a connected graph id of a corresponding data relation, aggregate the data of the same connected graph and push the data to hdfs storage based on the connected graph id, meanwhile keep the mapping between the operation parameters and the aggregation file in the aggregation process and lead the mapping into the hive database, and adjust the preset thread parameters to obtain the target thread parameters for data processing to obtain the data processing result. The method has the advantages that the data are split by adopting the connected graph in advance, the parallel preparation is made for the tasks, the data screening and the data conversion are performed in advance, the data volume during calculation is reduced, the data are installed and loaded in the memory through the data conversion, the model python code of the single computer is simply modified and converted into the spark code, the parallel calculation can be performed, and the parallelism can be dynamically adjusted according to the requirements of calculation resources and the tasks.

Description

Model calculation method and device based on mass graph data
Technical Field
The invention relates to the technical field of data processing, in particular to a model calculation method and device based on massive graph data.
Background
In a general knowledge graph project, a graph model is an important component of graph analysis and mining, and the graph model can perform deep analysis such as machine learning, data mining and the like by using graph data and can better discover the knowledge implied by the graph data. However, in practical application, because the graph data are associated more, the graph data are difficult to split during the calculation of the graph model, and when the graph data volume is small, the data can be loaded into a memory completely and then calculated, so that the influence is not great. However, under the condition of massive data, the data cannot be completely loaded into the memory, and meanwhile, the time consumed by the single machine operation cannot be accepted, so that a method which can consume smaller resources and can parallel the calculation under the condition of massive data is needed.
Disclosure of Invention
In order to solve the problems, the invention provides a model calculation method and a model calculation device based on massive graph data.
The embodiment of the invention provides a model calculation method based on massive graph data, which is applied to computer equipment and comprises the following steps:
importing graph data to be processed into a hive database from a graph database JanusGraph to obtain a data node list and a data relation list;
determining each data node and a connected graph id of a corresponding data relationship according to the data node list and the data relationship list;
obtaining a target file based on the connected graph id, and pushing the target file to an hdfs database;
performing data screening on the hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file, and importing the mapping file into the hive database;
adjusting the preset thread parameters to obtain target thread parameters;
and starting a data processing task according to the target thread parameter, and performing data processing on the mapping file in the hive database to obtain a data processing result.
Optionally, determining each data node and a connectivity graph id of a data relationship corresponding to the data node according to the data node list and the data relationship list includes:
reading the data node list and the data relation list through the acquired spark code;
and calculating a connected graph id of each data node and the corresponding data relation thereof based on the graph framework of spark.
Optionally, obtaining a target file based on the connectivity graph id, and pushing the target file to an hdfs database, including:
grouping the connected graph ids;
sequentially writing the data nodes and the data relations of each group into the initial file to obtain a target file;
and pushing the target file to an hdfs database.
Optionally, the data screening is performed on the hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file, and the importing the mapping file into the hive database includes:
defining a data filtering file;
reading the hdfs file directory according to the data filtering file;
converting each file to be processed in the hdfs file directory into an sqlite file and a mapping file of para and sqlite;
pushing the sqlite file and the mapping file to a specified directory of the hive database.
Optionally, adjusting the preset thread parameter to obtain the target thread parameter includes:
and modifying the preset standalone code into a distributed code.
Optionally, starting a data processing task according to the target thread parameter includes:
and starting a computing task based on the distributed code, and submitting the task by using a submit command of spark.
Optionally, the method further comprises:
and verifying the data processing result.
The embodiment of the invention provides a model calculation device based on massive graph data, which is applied to computer equipment and comprises the following functional modules:
the data import module is used for importing the graph data to be processed into the hive database from the graph database JanusGraph to obtain a data node list and a data relationship list;
the connected graph determining module is used for determining each data node and a connected graph id of a corresponding data relation according to the data node list and the data relation list;
the file pushing module is used for obtaining a target file based on the connected graph id and pushing the target file to the hdfs database;
the data screening module is used for screening data of the hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file and importing the mapping file into the hive database;
the parameter adjusting module is used for adjusting the preset thread parameters to obtain target thread parameters;
and the data processing module is used for starting a data processing task according to the target thread parameter and carrying out data processing on the mapping file in the hive database to obtain a data processing result.
Optionally, the connectivity map determining module is configured to:
reading the data node list and the data relation list through the acquired spark code;
and calculating a connected graph id of each data node and the corresponding data relation thereof based on the graph framework of spark.
Optionally, the file pushing module is configured to:
grouping the connected graph ids;
sequentially writing the data nodes and the data relations of each group into the initial file to obtain a target file;
and pushing the target file to an hdfs database.
The model calculation method and device based on the massive graph data, provided by the invention, are characterized in that graph data to be processed are imported from a graph database JanusGraph into a hive database to obtain a data node list and a data relation list, a connected graph id of each data node and a corresponding data relation of each data node is determined, data of the same connected graph are aggregated and pushed into hdfs storage based on the connected graph id, meanwhile, mapping of operation parameters and aggregation files is kept in the aggregation process and imported into the hive database, and preset thread parameters are adjusted to obtain target thread parameters so as to start a data processing task according to the target thread parameters to perform data processing on the mapping files in the hive database to obtain a data processing result. By the design, the data splitting is performed by adopting the connected graph in advance, the preparation can be made for the parallel task, the data screening and the data conversion are performed in advance, the data amount during calculation is reduced, the data are installed and loaded in the memory through the data conversion, the simple modification of the model python code of the single computer is converted into the spark code, the parallel calculation can be performed, and the parallelism can be dynamically adjusted according to the requirements of calculation resources and the task.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a model calculation method based on mass map data according to an embodiment of the present invention.
Fig. 2 is a block diagram of a model computing apparatus based on mass map data according to an embodiment of the present invention.
Detailed Description
In order to better understand the technical solutions of the present invention, the following detailed descriptions of the technical solutions of the present invention are provided with the accompanying drawings and the specific embodiments, and it should be understood that the specific features in the embodiments and the examples of the present invention are the detailed descriptions of the technical solutions of the present invention, and are not limitations of the technical solutions of the present invention, and the technical features in the embodiments and the examples of the present invention may be combined with each other without conflict.
The inventor finds that the common memory map model calculation scheme based on a single machine is as follows: a. loading a small amount of data into a memory, b, compiling corresponding analysis codes, processing the data in the memory, c, starting a single thread or a plurality of threads to run the analysis codes, and d, outputting results.
However, the prior art has the following disadvantages: the prior art can only process a small amount of data and load all the data into a memory, has high resource consumption, can run in parallel in multiple threads, but can only run on a single machine. When the data volume is large, the computing resources cannot be linearly expanded by using a computing engine related to the large data.
To improve the above problems. The inventor innovatively provides a model calculation method and device based on massive graph data. Referring first to fig. 1, a model calculation method based on mass map data is shown, which can be applied to a computer device and further implemented as described in the following steps S11-S16.
And step S11, importing the graph data to be processed from the graph database JanusGraph into the hive database to obtain a data node list and a data relationship list.
The data export step is to export data from a graph database JanusGraph (a distributed graph database, which is commonly used in a small-lot OLTP query) to Hive (a large data distributed data warehouse), and the purpose of the export is to make subsequent full-scale analysis (OLAP) consistent with previous data. The results derived have two Hive tables: a list of data nodes (nodes table) and a list of data relationships (relationships table).
nodes table
Figure BDA0002877388090000051
relations table
Figure BDA0002877388090000052
Figure BDA0002877388090000061
And step S12, determining each data node and the connected graph id of the corresponding data relation according to the data node list and the data relation list.
And step S13, obtaining a target file based on the connected graph id, and pushing the target file to the hdfs database.
And step S14, performing data screening on the hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file, and importing the mapping file into the hive database.
And step S15, adjusting the preset thread parameters to obtain the target thread parameters.
And step S16, starting a data processing task according to the target thread parameter, and performing data processing on the mapping file in the hive database to obtain a data processing result.
The method includes the steps that data of a graph to be processed are imported from a graph database JanusGraph into a hive database to obtain a data node list and a data relation list, a connected graph id of each data node and a corresponding data relation of each data node is determined, data of the same connected graph are aggregated and pushed into hdfs storage based on the connected graph id, meanwhile, mapping of operation parameters and aggregation files is kept in the aggregation process and imported into the hive database, preset thread parameters are adjusted to obtain target thread parameters, and data processing is conducted on the mapping files in the hive database by starting a data processing task according to the target thread parameters to obtain data processing results. By the design, the data splitting is performed by adopting the connected graph in advance, the preparation can be made for the parallel task, the data screening and the data conversion are performed in advance, the data amount during calculation is reduced, the data are installed and loaded in the memory through the data conversion, the simple modification of the model python code of the single computer is converted into the spark code, the parallel calculation can be performed, and the parallelism can be dynamically adjusted according to the requirements of calculation resources and the task.
Further, the determining, according to the data node list and the data relationship list, a connectivity graph id of each data node and the corresponding data relationship thereof in step S12 includes: reading the data node list and the data relation list through the acquired spark code; and calculating a connected graph id of each data node and the corresponding data relation thereof based on the graph framework of spark. It can be understood that in the conventional method, the reason why the data cannot be paralleled is that the graph data are managed mutually, so that the data are not split, and the splitting in this step is to calculate the connected graph id of each node by using the attributes of the graph data connected graph.
Further, the obtaining a target file based on the connected graph id and pushing the target file to the hdfs database as described in step S13 includes: grouping the connected graph ids; sequentially writing the data nodes and the data relations of each group into the initial file to obtain a target file; and pushing the target file to an hdfs database.
It can be understood that the purpose of data aggregation is to write the nodes and relationships of the same connected graph to the same file to prepare for subsequent calculation, and meanwhile, in order to prevent the generation of a large number of small files caused by the undersize of some connected graphs, the minimum number of records of one file needs to be set.
For example, writing Spark code, reading the result of the step, grouping according to the connected graph id, writing the nodes of each group into the file, starting with the node | one line, one record, then writing the relationship, starting with the relationships | and caching flush to the disk every time 1w lines are written in order to prevent memory overflow. If the number of records written in the file does not reach the threshold value, the writing in the new file is not switched, and the writing in the original file is resumed. The final document formed is as follows:
Figure BDA0002877388090000071
when the file write is completed, the file is pushed to hdfs (a distributed file storage system).
For some possible examples, the data screening, performed in step S14, on the hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file, and importing the mapping file into the hive database includes: defining a data filtering file; reading the hdfs file directory according to the data filtering file; converting each file to be processed in the hdfs file directory into an sqlite file and a mapping file of para and sqlite; pushing the sqlite file and the mapping file to a specified directory of the hive database. The purpose of this step is to filter the data and convert the file of step S13 into data required for calculation. Because the general model only uses partial data of the nodes and the relations, the unused data are screened out in advance, and the data size of calculation is reduced.
For example, the data filter file is as follows.
Figure BDA0002877388090000081
Figure BDA0002877388090000091
Writing a code reading configuration file, reading the hdfs file directory obtained in the step S13 by using spark, and converting each file under the directory into a sqlite file, wherein each sqlite file has two tables nodes, and the data and indexes of the two tables of relations conform to the definition of the above file. The Sqlite file name is added with a suffix 'db' after the original file name, and during the conversion process, the file where the para is located also needs to be written into the file. In the conversion process, every 10000 pieces of data are converted, and a record is submitted. Every time a file is converted, two files will be generated. One is the converted sqlite file and the other is the mapping file of para and sqlite, and the two files are pushed to the directory specified by hdfs. Importing the mapping file of para and sqlite into hive, wherein the hive table has two columns: para, db.
Further, the adjusting the preset thread parameter described in step S15 to obtain the target thread parameter includes: and modifying the preset standalone code into a distributed code. The purpose of this step is to convert the analyst written standalone code into distributed code.
In some embodiments, the initiating a data processing task according to the target thread parameter as described in step S16 includes: and starting a computing task based on the distributed code, and submitting the task by using a submit command of spark.
On the basis of the above, the method may further include the steps of: and verifying the data processing result. For example, after the calculation is completed, the data of the selected part is compared and verified with that of the single machine.
Based on the same inventive concept as above, please refer to fig. 2, which shows a model calculation apparatus 200 based on mass graph data, applied to a computer device, the apparatus includes the following modules:
the data import module 210 is configured to import the graph data to be processed from the graph database JanusGraph into the hive database to obtain a data node list and a data relationship list;
a connected graph determining module 220, configured to determine, according to the data node list and the data relationship list, a connected graph id of each data node and a data relationship corresponding to the data node;
the file pushing module 230 is configured to obtain a target file based on the connected graph id, and push the target file to an hdfs database;
a data screening module 240, configured to perform data screening on an hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file, and import the mapping file into the hive database;
a parameter adjusting module 250, configured to adjust a preset thread parameter to obtain a target thread parameter;
and the data processing module 260 is configured to start a data processing task according to the target thread parameter, and perform data processing on the mapping file in the hive database to obtain a data processing result.
Optionally, the connectivity map determining module 220 is configured to: reading the data node list and the data relation list through the acquired spark code; and calculating a connected graph id of each data node and the corresponding data relation thereof based on the graph framework of spark.
Optionally, the file pushing module 230 is configured to: grouping the connected graph ids; sequentially writing the data nodes and the data relations of each group into the initial file to obtain a target file; and pushing the target file to an hdfs database.
To sum up, the model calculation method and device based on massive graph data of the invention import graph data to be processed from a graph database JanusGraph into a hive database to obtain a data node list and a data relationship list and determine a connected graph id of each data node and a corresponding data relationship, aggregate data of the same connected graph and push the aggregated data to hdfs storage based on the connected graph id, simultaneously keep the mapping of operation parameters and an aggregation file in the aggregation process and import the aggregated data into the hive database, and adjust preset thread parameters to obtain target thread parameters for data processing to obtain data processing results. Therefore, the data splitting is carried out by adopting the connected graph in advance, the parallel preparation is made for the tasks, the data screening and the data conversion are carried out in advance, the data volume during the calculation is reduced, the data are installed and loaded in the memory through the data conversion, the simple modification of the model python code of the single computer is converted into the spark code, the parallel calculation can be carried out, and meanwhile the parallelism can be dynamically adjusted according to the calculation resources and the requirements of the tasks.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A model calculation method based on massive graph data is applied to computer equipment, and the method comprises the following steps:
importing graph data to be processed into a hive database from a graph database JanusGraph to obtain a data node list and a data relation list;
determining each data node and a connected graph id of a corresponding data relationship according to the data node list and the data relationship list;
obtaining a target file based on the connected graph id, and pushing the target file to an hdfs database;
performing data screening on the hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file, and importing the mapping file into the hive database;
adjusting the preset thread parameters to obtain target thread parameters;
and starting a data processing task according to the target thread parameter, and performing data processing on the mapping file in the hive database to obtain a data processing result.
2. The data processing method according to claim 1, wherein determining a connectivity graph id of each data node and its corresponding data relationship according to the data node list and the data relationship list comprises:
reading the data node list and the data relation list through the acquired spark code;
and calculating a connected graph id of each data node and the corresponding data relation thereof based on the graph framework of spark.
3. The data processing method according to claim 1, wherein obtaining a target file based on the connectivity graph id and pushing the target file to an hdfs database comprises:
grouping the connected graph ids;
sequentially writing the data nodes and the data relations of each group into the initial file to obtain a target file;
and pushing the target file to an hdfs database.
4. The data processing method of claim 1, wherein the data screening is performed on an hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file, and the importing the mapping file into the hive database comprises:
defining a data filtering file;
reading the hdfs file directory according to the data filtering file;
converting each file to be processed in the hdfs file directory into an sqlite file and a mapping file of para and sqlite;
pushing the sqlite file and the mapping file to a specified directory of the hive database.
5. The data processing method of claim 1, wherein adjusting the preset thread parameter to obtain the target thread parameter comprises:
and modifying the preset standalone code into a distributed code.
6. The data processing method of claim 5, wherein initiating a data processing task according to the target thread parameter comprises:
and starting a computing task based on the distributed code, and submitting the task by using a submit command of spark.
7. The data processing method of claim 1, wherein the method further comprises:
and verifying the data processing result.
8. A model calculation device based on massive graph data is applied to computer equipment and comprises the following functional modules:
the data import module is used for importing the graph data to be processed into the hive database from the graph database JanusGraph to obtain a data node list and a data relationship list;
the connected graph determining module is used for determining each data node and a connected graph id of a corresponding data relation according to the data node list and the data relation list;
the file pushing module is used for obtaining a target file based on the connected graph id and pushing the target file to the hdfs database;
the data screening module is used for screening data of the hdfs file directory corresponding to the target file in the hdfs database to obtain a mapping file and importing the mapping file into the hive database;
the parameter adjusting module is used for adjusting the preset thread parameters to obtain target thread parameters;
and the data processing module is used for starting a data processing task according to the target thread parameter and carrying out data processing on the mapping file in the hive database to obtain a data processing result.
9. The data processing apparatus of claim 8, wherein the connectivity graph determination module is configured to:
reading the data node list and the data relation list through the acquired spark code;
and calculating a connected graph id of each data node and the corresponding data relation thereof based on the graph framework of spark.
10. The data processing apparatus of claim 8, wherein the file push module is configured to:
grouping the connected graph ids;
sequentially writing the data nodes and the data relations of each group into the initial file to obtain a target file;
and pushing the target file to an hdfs database.
CN202011625560.8A 2020-12-31 2020-12-31 Model calculation method and device based on mass graph data Active CN112711659B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011625560.8A CN112711659B (en) 2020-12-31 2020-12-31 Model calculation method and device based on mass graph data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011625560.8A CN112711659B (en) 2020-12-31 2020-12-31 Model calculation method and device based on mass graph data

Publications (2)

Publication Number Publication Date
CN112711659A true CN112711659A (en) 2021-04-27
CN112711659B CN112711659B (en) 2024-03-15

Family

ID=75547652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011625560.8A Active CN112711659B (en) 2020-12-31 2020-12-31 Model calculation method and device based on mass graph data

Country Status (1)

Country Link
CN (1) CN112711659B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150169758A1 (en) * 2013-12-17 2015-06-18 Luigi ASSOM Multi-partite graph database
CN104809168A (en) * 2015-04-06 2015-07-29 华中科技大学 Partitioning and parallel distribution processing method of super-large scale RDF graph data
CN105335230A (en) * 2014-07-30 2016-02-17 阿里巴巴集团控股有限公司 Service processing method and apparatus
CN105849764A (en) * 2013-10-25 2016-08-10 西斯摩斯公司 Systems and methods for identifying influencers and their communities in a social data network
CN110134516A (en) * 2019-05-16 2019-08-16 深圳前海微众银行股份有限公司 Finance data processing method, device, equipment and computer readable storage medium
CN111428095A (en) * 2020-06-11 2020-07-17 上海冰鉴信息科技有限公司 Graph data quality verification method and graph data quality verification device
CN111460234A (en) * 2020-03-26 2020-07-28 平安科技(深圳)有限公司 Graph query method and device, electronic equipment and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105849764A (en) * 2013-10-25 2016-08-10 西斯摩斯公司 Systems and methods for identifying influencers and their communities in a social data network
US20150169758A1 (en) * 2013-12-17 2015-06-18 Luigi ASSOM Multi-partite graph database
CN105335230A (en) * 2014-07-30 2016-02-17 阿里巴巴集团控股有限公司 Service processing method and apparatus
CN104809168A (en) * 2015-04-06 2015-07-29 华中科技大学 Partitioning and parallel distribution processing method of super-large scale RDF graph data
CN110134516A (en) * 2019-05-16 2019-08-16 深圳前海微众银行股份有限公司 Finance data processing method, device, equipment and computer readable storage medium
CN111460234A (en) * 2020-03-26 2020-07-28 平安科技(深圳)有限公司 Graph query method and device, electronic equipment and computer readable storage medium
CN111428095A (en) * 2020-06-11 2020-07-17 上海冰鉴信息科技有限公司 Graph data quality verification method and graph data quality verification device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TODOR IVANOV等: "evaluating Hive and Spark SQL with bigbench", ARXIV, 28 December 2015 (2015-12-28), pages 1 - 10 *
刘超;唐郑望;姚宏;胡成玉;梁庆中;: "云平台下图数据处理技术", 计算机应用, vol. 35, no. 01, 10 January 2015 (2015-01-10), pages 43 - 47 *
唐德权;张波云;: "基于路径的频繁子图挖掘算法研究", 计算机工程与科学, vol. 41, no. 12, 15 December 2019 (2019-12-15), pages 2223 - 2230 *

Also Published As

Publication number Publication date
CN112711659B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
JP5298117B2 (en) Data merging in distributed computing
US9471607B2 (en) Data loading tool
US9460188B2 (en) Data warehouse compatibility
US9356966B2 (en) System and method to provide management of test data at various lifecycle stages
US8949222B2 (en) Changing the compression level of query plans
Kunda et al. A comparative study of nosql and relational database
CN108197306B (en) SQL statement processing method and device, computer equipment and storage medium
US20110307502A1 (en) Extensible event-driven log analysis framework
US20120078904A1 (en) Approximate Index in Relational Databases
DE112011101200T5 (en) Column-oriented memory representations of data records
CN106528898A (en) Method and device for converting data of non-relational database into relational database
Mehmood et al. Performance analysis of not only SQL semi-stream join using MongoDB for real-time data warehousing
CN109669975B (en) Industrial big data processing system and method
Sinthong et al. Aframe: Extending dataframes for large-scale modern data analysis
CN111966760A (en) Hive data warehouse-based test data generation method and device
CN107430633B (en) System and method for data storage and computer readable medium
CN110851515A (en) Big data ETL model execution method and medium based on Spark distributed environment
CN112711659A (en) Model calculation method and device based on mass graph data
Martins et al. NoSQL comparative performance study
CN109543079B (en) Data query method and device, computing equipment and storage medium
Dai et al. The Hadoop stack: new paradigm for big data storage and processing
WO2021061183A1 (en) Shuffle reduce tasks to reduce i/o overhead
US20220222229A1 (en) Automated database modeling
CN117390040B (en) Service request processing method, device and storage medium based on real-time wide table
Huang et al. Cloud Based Test Coverage Service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant