CN106055676A - Data source tracing method and system based on big data model analysis platform - Google Patents

Data source tracing method and system based on big data model analysis platform Download PDF

Info

Publication number
CN106055676A
CN106055676A CN201610395246.2A CN201610395246A CN106055676A CN 106055676 A CN106055676 A CN 106055676A CN 201610395246 A CN201610395246 A CN 201610395246A CN 106055676 A CN106055676 A CN 106055676A
Authority
CN
China
Prior art keywords
source
tracing
file
model
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610395246.2A
Other languages
Chinese (zh)
Other versions
CN106055676B (en
Inventor
林劼
郝鹏飞
彭世锦
李年华
陆文斌
王晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201610395246.2A priority Critical patent/CN106055676B/en
Publication of CN106055676A publication Critical patent/CN106055676A/en
Application granted granted Critical
Publication of CN106055676B publication Critical patent/CN106055676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention discloses a data source tracing method and system based on a big data model analysis platform. The method comprises a model workflow analysis step of analyzing input nodes, output nodes and action nodes of model workflows formed by models on a Hadoop platform, and obtaining a unique identifier of each node; S2, a source trace information metadata model designing step of correspondingly describing a source trace file according to each model workflow; S3, a source trace information storing step of establishing indexes for the source trace files, storing the index information in a cache database, wherein an index file is stored in HDFS; and S4, a data source tracing step of judging whether to trace a data generation process or not, and obtaining the addresses of the source trace files by searching the index information if the data generation process is not traced. According to the method, the problem that a traditional data source trace method is inapplicable under the big data platform is solved; the indexes are established for the source trace files; input/output operations are reduced; and the search speed is improved.

Description

A kind of data source tracing method based on big data model analysis platform and system
Technical field
The present invention relates to data tracing technology field, particularly relate to a kind of data based on big data model analysis platform Source tracing method and system.
Background technology
Big data model analysis platform is the structure design relating to model on Hadoop cluster, develops and conclude the business Platform.System provides the model on basis, and user can build oneself by visual designer on its basis Model, it is possible to use this model to carry out the industry data that analysis platform provides.Owing to storage and the calculating of bottom are to pass through Hadoop cluster is supported, so this platform is to build the model analysis platform on big data environment, the design drawing of model As shown in Figure 1.
In recent years along with computer and the development of mobile Internet, various information are explosive growth, these information bases Originally being segmented into two classes, a class is original logging data, is owing to these data derive from through dry-cure if another kind of Data.But be typically exposed to the often result data of user, these data for user, its processing procedure or Being unknown for saying credibility, and sometimes result data and initial data do not have any relation, this allows for user Must go to be concerned about the source of result data, therefore create data tracing technology.
Data are traced to the source the description information of the generation process being the origin to data and data, and these information are at a lot of aspects Playing an important role, such as tune-up data and conversion, the quality of data of auditing, assess and degree of belief and realization are to data The aspect such as access control.Data tracing technology is studied relatively fewer at home, wears comparison free from worldly cares and have studied in data serially Data tracing technology in field, warehouse;The data tracking that Wang Liwei et al. mainly have studied in scientific workflow service framework is asked Topic, and propose a kind of data source tracing method based on the bidirectional pointer in object broker data base;Li Yazi have studied data and rises The dimension model in source and descriptive model, and introduce 7W model;Chen Ying et al. devises origin based on DNA double helical structure and chases after Track model.Abroad also there are a lot of university and research institution that data are traced to the source as subject study, wherein Grigors Karvounarakis propose a kind of ProQL language traced to the source based on tuple, semi-ring solve to trace to the source maintenance, store and inquire about Etc. relevant issues;Tanu Malik et al. describes a kind of data that gather in distributed application program and traces to the source, and by experiment Indicate the feasibility of scattered management architecture of tracing to the source and effectively improve the efficiency etc. of origin inquiry.
Traditional source tracing method is mainly in terms of data base and research-on-research stream calculation, and under big data platform, No matter it is source data or result data, is stored on HDFS, it is impossible to directly use the mode of mark tuple to carry out labelling.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of number based on big data model analysis platform According to source tracing method and system, solve the data that the Multi-Model Combination under big data model analysis platform processes and trace to the source problem.
It is an object of the invention to be achieved through the following technical solutions: a kind of number based on big data model analysis platform According to source tracing method, comprise the following steps:
S1. model workflow analysis: analyze the input node of the model workflow that the model in Hadoop platform is constituted, defeated Egress and action node, and obtain unique mark of each node;
S2. information metadata model of tracing to the source is designed: describe a file of tracing to the source according to each model workflow correspondence;
S3. information of tracing to the source stores: to described file index building of tracing to the source, index information leaves cache database, rope in Quotation part leaves on HDFS (Hadoop distributed file system);
S4. data tracing: judging whether to follow the trail of data generating procedure, if not following the trail of data generating procedure, then passing through Inquire about the address of file of tracing to the source described in the acquisition of described index information.
Described step S1 includes following sub-step:
S11. scan described model workflow, find the first element node of described model workflow, obtain described The input file path of one action node is as the input file path of described model workflow;
Find last action node of described model workflow, obtain the output literary composition of last action node described Part path, as the output file path of described model workflow, preserves the input file path of described model workflow and described Model workflow output file path;
S12. detect the everything node of described model workflow, obtain unique mark and the name of described model workflow Claim, and use adjacency list to be cached.
According to the method that each model workflow correspondence describes a file of tracing to the source it is:
S21. scan model workflow, obtains control stream node, input file path and the output of described model workflow File path;
S22. the relation between everything node and each action node of described model workflow is detected, by described all Relation between action node and each action node, as cache information, uses adjacency list caching;
S23. cache information write is traced to the source in file, and file of tracing to the source is saved on HDFS;
S24. by the input file path of described model workflow and output file path, file of tracing to the source address with key assignments To form be saved in cache database.
Described file one quaternary array W={ID, I, O, M, the T} of tracing to the source represents, wherein, ID represents that described model works Unique mark of stream, I represents the input node of described model workflow, and O represents the output node of described model workflow, M table Showing the set of described model workflow actions node, T represents the timestamp building described model workflow.
The set M={m1 of described model workflow actions node, m2...mn}, mi represent a model, by each model Regarding an action node as, < mi, mj > represents the output input as mj of mi so that in M and M between each action node Relation constitute a directed acyclic graph.
The acquisition methods of the address of described file of tracing to the source is: delay according to the output file path query of described model workflow Deposit data storehouse, obtains the address of file of tracing to the source.
In described step S4, if following the trail of data generating procedure, then by inquiring about literary composition of tracing to the source described in the acquisition of described index information The address of part, file of tracing to the source according to the address acquisition of file of tracing to the source, build figure of tracing to the source, reproduce the generation process of data.
In described step S4, if following the trail of data generating procedure, including following sub-step:
S51. by inquiring about the address of file of tracing to the source described in the acquisition of described index information;
S52. the file of tracing to the source that the address reading of file of tracing to the source described in basis is stored on HDFS, delays described file of tracing to the source Exist in adjacency list;
S53. trace to the source the everything node in file and the relation between each action node described in reading, pass through adjacency list Structure directed acyclic graph, reproduces the generation process of data.
A kind of data traceability system based on big data model analysis platform, including:
Model workflow analysis module, for analyzing the input road of the model workflow that the model in Hadoop platform is constituted Footpath, outgoing route and action node, and obtain unique mark of each model in described model workflow;
Trace to the source information metadata modelling module, for describing a literary composition of tracing to the source according to each model workflow correspondence Part;
Trace to the source information storage module, for described file index building of tracing to the source, index information is left in data cached Storehouse, index file leaves on HDFS;
Data tracing module, for the address by inquiring about file of tracing to the source described in the acquisition of described index information, according to tracing back Trace to the source described in the address acquisition of source file file, build figure of tracing to the source, reproduce the generation process of data.
The invention has the beneficial effects as follows: instant invention overcomes traditional data source tracing method inapplicable under big data platform Problem, and set up index for file of tracing to the source, decrease I/O operation (input/output operations), improve inquiry velocity.
Accompanying drawing explanation
Fig. 1 is existing big data model analysis platform model workflow diagrams;
Fig. 2 is the flow chart of data source tracing method in the present invention;
Fig. 3 is the flow chart designing information metadata model of tracing to the source in the present invention;
Fig. 4 is the flow chart of data tracing in the present invention
Fig. 5 is the schematic diagram of data traceability system in the present invention.
Detailed description of the invention
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to The following stated.
As in figure 2 it is shown, a kind of data source tracing method based on big data model analysis platform, comprise the following steps:
S1. model workflow analysis: model workflow be by control stream node (such as, start node and end node) and The workflow run in Hadoop platform of action node composition, analyzes the model work that the model in Hadoop platform is constituted Input node, output node and the action node of stream, and obtain unique mark of each node.
Described step S1 includes following sub-step:
S11. scan described model workflow, find the first element node of described model workflow, obtain described The input file path of one action node is as the input file path of described model workflow;
Find last action node of described model workflow, obtain the output literary composition of last action node described Part path, as the output file path of described model workflow, preserves the input file path of described model workflow and described Model workflow output file path;
S12. detect the everything node of described model workflow, obtain unique mark and the name of described model workflow Claim, and use adjacency list to be cached.
S2. design is traced to the source information metadata model: describes one according to each model workflow correspondence and traces to the source literary composition based on XML Part.
As in figure 2 it is shown, describe one according to each model workflow correspondence based on the trace to the source method of file of XML it is:
S21. scan model workflow, obtains control stream node, input file path and the output of described model workflow File path;
S22. the relation between everything node and each action node of described model workflow is detected, by described all Relation between action node and each action node, as cache information, uses adjacency list caching;
S23. cache information write is traced to the source in file, and file of tracing to the source is saved on HDFS;
S24. by the input file path of described model workflow and output file path, file of tracing to the source address with key assignments To form be saved in cache database.
Described file one quaternary array W={ID, I, O, M, the T} of tracing to the source represents, wherein, ID represents that described model works Unique mark of stream, I represents the input node of described model workflow, and O represents the output node of described model workflow, M table Showing the set of described model workflow actions node, T represents the timestamp building described model workflow.
The set M={m1 of described model workflow actions node, m2...mn}, mi represent a model, by each model Regarding an action node as, < mi, mj > represents the output input as mj of mi so that in M and M between each action node Relation constitute a directed acyclic graph.
S3. information of tracing to the source stores: to described file index building of tracing to the source, index information leaves cache database, rope in Quotation part leaves on HDFS.
S4. data tracing: judging whether to follow the trail of data generating procedure, if not following the trail of data generating procedure, then passing through Inquire about the address of file of tracing to the source described in the acquisition of described index information;If tracking data generating procedure, then by inquiring about described index Trace to the source described in acquisition of information the address of file, file of tracing to the source according to the address acquisition of file of tracing to the source, build figure of tracing to the source, reproduce The generation process of data, as shown in Figure 4.
The acquisition methods of the address of described file of tracing to the source is: delay according to the output file path query of described model workflow Deposit data storehouse, obtains the address of file of tracing to the source.
In described step S4, if following the trail of data generating procedure, including following sub-step:
S51. by inquiring about the address of file of tracing to the source described in the acquisition of described index information;
S52. the file of tracing to the source that the address reading of file of tracing to the source described in basis is stored on HDFS, delays described file of tracing to the source Exist in adjacency list;
S53. trace to the source the everything node in file and the relation between each action node described in reading, pass through adjacency list Structure directed acyclic graph, reproduces the generation process of data.
As it is shown in figure 5, a kind of data traceability system based on big data model analysis platform, including:
Model workflow analysis module, for analyzing the input road of the model workflow that the model in Hadoop platform is constituted Footpath, outgoing route and action node, and obtain unique mark of each model in described model workflow;
Trace to the source information metadata modelling module, for describing a literary composition of tracing to the source according to each model workflow correspondence Part;
Trace to the source information storage module, for described file index building of tracing to the source, index information is left in data cached Storehouse, index file leaves on HDFS;
Data tracing module, for the address by inquiring about file of tracing to the source described in the acquisition of described index information, according to tracing back Trace to the source described in the address acquisition of source file file, build figure of tracing to the source, reproduce the generation process of data.
The above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein Form, is not to be taken as the eliminating to other embodiments, and can be used for other combinations various, amendment and environment, and can be at this In the described contemplated scope of literary composition, it is modified by above-mentioned teaching or the technology of association area or knowledge.And those skilled in the art are entered The change of row and change, the most all should be at the protection domains of claims of the present invention without departing from the spirit and scope of the present invention In.

Claims (9)

1. a data source tracing method based on big data model analysis platform, it is characterised in that: comprise the following steps:
S1. model workflow analysis: analyze the input node of the model workflow that the model in Hadoop platform is constituted, output joint Point and action node, and obtain unique mark of each node;
S2. information metadata model of tracing to the source is designed: describe a file of tracing to the source according to each model workflow correspondence;
S3. information of tracing to the source stores: to described file index building of tracing to the source, and index information leaves in cache database, index literary composition Part leaves on HDFS;
S4. data tracing: judge whether to follow the trail of data generating procedure, if not following the trail of data generating procedure, then by inquiry Described index information is traced to the source the address of file described in obtaining.
A kind of data source tracing method based on big data model analysis platform the most according to claim 1, it is characterised in that: Described step S1 includes following sub-step:
S11. scan described model workflow, find the first element node of described model workflow, obtain described first The input file path of action node is as the input file path of described model workflow;
Find last action node of described model workflow, obtain the output file road of last action node described Footpath, as the output file path of described model workflow, preserves the input file path of described model workflow and described model Workflow output file path;
S12. detect the everything node of described model workflow, obtain unique mark and the title of described model workflow, And use adjacency list to be cached.
A kind of data source tracing method based on big data model analysis platform the most according to claim 2, it is characterised in that: According to the method that each model workflow correspondence describes a file of tracing to the source it is:
S21. scan model workflow, obtains control stream node, input file path and the output file of described model workflow Path;
S22. the relation between everything node and each action node of described model workflow is detected, by described everything Relation between node and each action node, as cache information, uses adjacency list caching;
S23. cache information write is traced to the source in file, and file of tracing to the source is saved on HDFS;
S24. by the input file path of described model workflow and output file path, file of tracing to the source address with key-value pair Form is saved in cache database.
A kind of data source tracing method based on big data model analysis platform the most according to claim 2, it is characterised in that: Described file one quaternary array W={ID, I, O, M, the T} of tracing to the source represents, wherein, ID represents the unique of described model workflow Mark, I represents the input node of described model workflow, and O represents the output node of described model workflow, and M represents described mould The set of type workflow actions node, T represents the timestamp building described model workflow.
A kind of data source tracing method based on big data model analysis platform the most according to claim 4, it is characterised in that: The set M={m1 of described model workflow actions node, m2...mn}, mi represent a model, regard each model as one Action node, < mi, mj > represents that the output of mi is as the input of mj so that relation structure between each action node in M and M Become a directed acyclic graph.
A kind of data source tracing method based on big data model analysis platform the most according to claim 1, it is characterised in that: The acquisition methods of the address of described file of tracing to the source is: the output file path query according to described model workflow is data cached Storehouse, obtains the address of file of tracing to the source.
A kind of data source tracing method based on big data model analysis platform the most according to claim 1, it is characterised in that: In described step S4, if following the trail of data generating procedure, then by inquiring about the address of file of tracing to the source described in the acquisition of described index information, Trace to the source described in address acquisition according to file of tracing to the source file, build figure of tracing to the source, reproduce the generation process of data.
A kind of data source tracing method based on big data model analysis platform the most according to claim 7, it is characterised in that: In described step S4, if following the trail of data generating procedure, including following sub-step:
S51. by inquiring about the address of file of tracing to the source described in the acquisition of described index information;
S52. the file of tracing to the source that the address reading of file of tracing to the source described in basis is stored on HDFS, exists described file cache of tracing to the source In adjacency list;
S53. trace to the source the everything node in file and the relation between each action node described in reading, constructed by adjacency list Directed acyclic graph, reproduces the generation process of data.
9. a data traceability system based on big data model analysis platform, it is characterised in that: including:
Model workflow analysis module, for analyze the model workflow that model in Hadoop platform is constituted input path, Outgoing route and action node, and obtain unique mark of each model in described model workflow;
Trace to the source information metadata modelling module, for describing a file of tracing to the source according to each model workflow correspondence;
Trace to the source information storage module, for described file index building of tracing to the source in, index information is left cache database, rope Quotation part leaves on HDFS;
Data tracing module, for the address by inquiring about file of tracing to the source described in the acquisition of described index information, according to literary composition of tracing to the source Trace to the source described in the address acquisition of part file, build figure of tracing to the source, reproduce the generation process of data.
CN201610395246.2A 2016-06-03 2016-06-03 A kind of data source tracing method and system based on big data model analysis platform Active CN106055676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610395246.2A CN106055676B (en) 2016-06-03 2016-06-03 A kind of data source tracing method and system based on big data model analysis platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610395246.2A CN106055676B (en) 2016-06-03 2016-06-03 A kind of data source tracing method and system based on big data model analysis platform

Publications (2)

Publication Number Publication Date
CN106055676A true CN106055676A (en) 2016-10-26
CN106055676B CN106055676B (en) 2019-04-02

Family

ID=57169646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610395246.2A Active CN106055676B (en) 2016-06-03 2016-06-03 A kind of data source tracing method and system based on big data model analysis platform

Country Status (1)

Country Link
CN (1) CN106055676B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106802922A (en) * 2016-12-19 2017-06-06 华中科技大学 A kind of object-based storage system and method for tracing to the source
CN106850564A (en) * 2016-12-29 2017-06-13 北京安天网络安全技术有限公司 A kind of method and system for positioning file transverse shifting path
CN107239523A (en) * 2017-05-26 2017-10-10 电子科技大学 A kind of fine-grained data source tracing method under the model platform based on big data
CN107704504A (en) * 2017-08-29 2018-02-16 中国科学院光电研究院 A kind of RINEX files are traced to the source information extracting method and system
CN109857924A (en) * 2019-02-28 2019-06-07 重庆科技学院 A kind of big data analysis monitor information processing system and method
CN111144755A (en) * 2019-12-26 2020-05-12 安徽朋德信息科技有限公司 Scientific research instrument experiment result traceability management system and method
CN111241100A (en) * 2020-01-09 2020-06-05 北京齐尔布莱特科技有限公司 Workflow configuration system and method
CN111491018A (en) * 2020-04-07 2020-08-04 中国建设银行股份有限公司 Model downloading method and system
CN112199352A (en) * 2020-10-14 2021-01-08 武汉第二船舶设计研究所(中国船舶重工集团公司第七一九研究所) Product data tracing method and system
CN112269316A (en) * 2020-10-28 2021-01-26 中国科学院信息工程研究所 High-robustness threat hunting system and method based on graph neural network
CN114297262A (en) * 2021-12-30 2022-04-08 重庆允成互联网科技有限公司 Data tracing method based on data stream and computer storage medium
CN115964397A (en) * 2022-09-20 2023-04-14 成都比特信安科技有限公司 Data seed implantation and tracing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100299339A1 (en) * 2009-05-20 2010-11-25 International Business Machines Corporation Indexing provenance data and evaluating provenance data queries in data processing systems
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN103164614A (en) * 2013-01-30 2013-06-19 南京理工大学常熟研究院有限公司 Recursive data tracing method at runtime for supporting data recurrence
CN103177184A (en) * 2013-01-30 2013-06-26 南京理工大学常熟研究院有限公司 Runtime recursion data source tracing method of low storage expenditure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100299339A1 (en) * 2009-05-20 2010-11-25 International Business Machines Corporation Indexing provenance data and evaluating provenance data queries in data processing systems
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN103164614A (en) * 2013-01-30 2013-06-19 南京理工大学常熟研究院有限公司 Recursive data tracing method at runtime for supporting data recurrence
CN103177184A (en) * 2013-01-30 2013-06-26 南京理工大学常熟研究院有限公司 Runtime recursion data source tracing method of low storage expenditure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谢雨来: "溯源的高效存储管理及在安全方面的应用研究", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106802922B (en) * 2016-12-19 2020-07-10 华中科技大学 Tracing storage system and method based on object
CN106802922A (en) * 2016-12-19 2017-06-06 华中科技大学 A kind of object-based storage system and method for tracing to the source
CN106850564A (en) * 2016-12-29 2017-06-13 北京安天网络安全技术有限公司 A kind of method and system for positioning file transverse shifting path
CN107239523A (en) * 2017-05-26 2017-10-10 电子科技大学 A kind of fine-grained data source tracing method under the model platform based on big data
CN107704504A (en) * 2017-08-29 2018-02-16 中国科学院光电研究院 A kind of RINEX files are traced to the source information extracting method and system
CN109857924A (en) * 2019-02-28 2019-06-07 重庆科技学院 A kind of big data analysis monitor information processing system and method
CN111144755A (en) * 2019-12-26 2020-05-12 安徽朋德信息科技有限公司 Scientific research instrument experiment result traceability management system and method
CN111241100A (en) * 2020-01-09 2020-06-05 北京齐尔布莱特科技有限公司 Workflow configuration system and method
CN111241100B (en) * 2020-01-09 2024-03-01 北京齐尔布莱特科技有限公司 Workflow configuration system and method
CN111491018A (en) * 2020-04-07 2020-08-04 中国建设银行股份有限公司 Model downloading method and system
CN112199352A (en) * 2020-10-14 2021-01-08 武汉第二船舶设计研究所(中国船舶重工集团公司第七一九研究所) Product data tracing method and system
CN112269316A (en) * 2020-10-28 2021-01-26 中国科学院信息工程研究所 High-robustness threat hunting system and method based on graph neural network
CN114297262A (en) * 2021-12-30 2022-04-08 重庆允成互联网科技有限公司 Data tracing method based on data stream and computer storage medium
CN115964397A (en) * 2022-09-20 2023-04-14 成都比特信安科技有限公司 Data seed implantation and tracing method
CN115964397B (en) * 2022-09-20 2023-09-19 成都比特信安科技有限公司 Data seed implantation and tracing method

Also Published As

Publication number Publication date
CN106055676B (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN106055676A (en) Data source tracing method and system based on big data model analysis platform
CN106874426B (en) RDF (resource description framework) streaming data keyword real-time searching method based on Storm
Lakshen et al. Big data and quality: A literature review
Debattista et al. Linked'Big'Data: towards a manifold increase in big data value and veracity
Nikhil et al. A survey on text mining and sentiment analysis for unstructured web data
US20180203943A1 (en) Method for discovering relevant concepts in a semantic graph of concepts
Jiang et al. Application intelligent search and recommendation system based on speech recognition technology
Dawes et al. Sensor metadata management and its application in collaborative environmental research
Truică et al. TextBenDS: a generic textual data benchmark for distributed systems
Cheng et al. Data evolution analysis of virtual dataspace for managing the big data lifecycle
Taleghani Executive information systems development lifecycle
Zahedi Nooghabi et al. Proposed metrics for data accessibility in the context of linked open data
Al-Barznji et al. Review of big data and big data mining for adding big value to enterprises
Calvanese et al. Towards Practical OBDA with Temporal Ontologies: (Position Paper)
Dong et al. Scene-based big data quality management framework
Yan et al. Analysis of research papers on E-commerce (2000–2013): based on a text mining approach
Sulova Big data processing in the logistics industry
Truică et al. Benchmarking top-k keyword and top-k document processing with T2K2 and T2K2D2
Marjit et al. Provenance representation and storage techniques in linked data: A state-of-the-art survey
Huang et al. Automatic question-answering based on Wikipedia data extraction
Wang A conceptual modeling framework for network analytics
Zong et al. Minimal explanations of missing values by chasing acquisitional data
Li et al. Research of network data mining based on reliability source under big data environment
Amaturo et al. Digital methods and the evolution of the epistemology of social sciences
Hao et al. A fast algorithm on generating concept lattice for symmetry formal context constructed from social networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant