CN107239523A - A kind of fine-grained data source tracing method under the model platform based on big data - Google Patents

A kind of fine-grained data source tracing method under the model platform based on big data Download PDF

Info

Publication number
CN107239523A
CN107239523A CN201710385468.0A CN201710385468A CN107239523A CN 107239523 A CN107239523 A CN 107239523A CN 201710385468 A CN201710385468 A CN 201710385468A CN 107239523 A CN107239523 A CN 107239523A
Authority
CN
China
Prior art keywords
source
tracing
data
fine
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710385468.0A
Other languages
Chinese (zh)
Inventor
林劼
杜亚伟
刘铸
高泽仁
段炜煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710385468.0A priority Critical patent/CN107239523A/en
Publication of CN107239523A publication Critical patent/CN107239523A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the fine-grained data source tracing method under a kind of model platform based on big data, a kind of fine-grained data source tracing method is constructed for big data model platform, for solve fine granularity under big data platform trace to the source data dependence distinguish problem.This method comprises the following steps:S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines;S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.Strong applicability of the present invention, to trace to the source, file sets up index, reduces I/O operation, improves inquiry velocity.

Description

A kind of fine-grained data source tracing method under the model platform based on big data
Technical field
The present invention relates to a kind of data source tracing method, the fine-grained data under especially a kind of model platform based on big data Source tracing method.
Background technology
In recent years with the development of computer and mobile Internet, various information are in explosive growth, these information bases Originally two classes are segmented into, a class is original logging data, pass through dry-cure if another kind of by these data and derive from Data.But the general often result data for being exposed to user, these data for the user, its processing procedure or Say for confidence level it is unknown, and sometimes result data and initial data do not have any relation, this allows for user It must go to be concerned about the source of result data, therefore generate data tracing technology.
It is description to data origin and data generating procedure that data, which are traced to the source, and these information play important at many aspects Effect, such as tune-up data and conversion, audit, the quality and degree of belief of assessing data and realize to the access controls of data In terms of.Data are traced to the source to be divided into coarseness and trace to the source and traced to the source with fine granularity, are traced to the source aspect in fine granularity, studies in China is relatively It is few.
Traditional fine-grained data source tracing method is concentrated mainly on database field, and its solution is by increasing mark Field carrys out the processing communication process of each single item in database of record, and under big data platform, either source data or result Data, are stored on HDFS, it is impossible to directly each input data is annotated.Therefore the present invention proposes a kind of for big The fine-grained data source tracing method of data model platform.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of fine granularity based on big data model platform Data source tracing method, can solve the problem that the data dependence during data are traced to the source under big data model platform distinguishes problem.
The purpose of the present invention is achieved through the following technical solutions, a kind of fine granularity based on big data model platform Data source tracing method, comprises the following steps:
S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines, Input, output and big data in Main Analysis workflow handle the data handling procedure of framework;
S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;
S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;
S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;
S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.
Described model workflow is the workflow being made up of in Hadoop platform controlling stream node and action node, and Explained and performed by Hadoop Oozie workflow engines server.
Described fine granularity genetic definition, by giving the workflow W under a big data platform, and with a four-tuple W={ I, O, M, P } is expressed as, wherein I represents the input set I={ i of the workflow1,i2...in, wherein i represents input file In single input item;O represents the output collection O={ o of workflow1,o2...on, wherein o represents single output item in output file; M represents the Models Sets M={ m in workflow1,m2...mn, wherein m represents arbitrary model in workflow;P represents the thin of workflow Granularity data is traced to the source operation.
The information capture of tracing to the source, by model treatment framework extension, and adds generation and the transmission work(of information of tracing to the source Can, the information of tracing to the source produced in model implementation procedure is transmitted in workflow processing model.
The mark storage of tracing to the source, sets up between input and output item to each model and closes by using middle mark Connection, and the association for information of tracing to the source is stored on HDFS in the form of a file.
The tracing, is chased after based on storage file of tracing to the source, and in a kind of recursive mode to any result data item Track produces its all correlated inputs, and the granularity of tracing is based on row DBMS.
Control node in described model workflow does not produce influence to data, therefore Main Analysis action node is such as MapReduce, Hive, Spark etc..
Described fine granularity genetic definition includes following sub-step:
S21:Single model is traced to the source expression:Assuming that the model conversion of any one in workflow is expressed as T, a conversion is given Example T (I)=O, input set is I, single output element o ∈ O, fine granularity trace to the source be required to determine that those are contributed to it is defeated Go out element o input subset
S22:Workflow is traced to the source expression:Workflow, which is traced to the source, is traced back to being related to all model conversions in work at present stream Source, and traced to the source with recursive fashion representation according to single conversion, P is used in tracing to the source for workflow WwRepresent, it is any in workflow W Single tracing to the source for original e is expressed as Pw(e), if e is initial input element, i.e. e ∈ Ik, then Pw(e)={ e }, otherwise assumes T It is used as output e conversion, PT(e) traced to the source as e one-level, recurrence is expressed as
Described information capture of tracing to the source includes following sub-step:
S31:RecordReader extends:RecordReader wrapper is by the output key assignments (k produced every timei,vi) and Corresponding unique mark q is combined into (ki,<vi,q>) Mapper is passed to together;
S32:Mapper extends:Mapper wrappers are by forward data (ki,<vi,q>) as input, and it is decomposed, will Export key assignments (ki,vi) pass to bottom map functions processing, obtain new output key assignments (km,vm), Mapper wrappers will New output key assignments (km,vm) and unique mark q is together as result and is encapsulated as (km,<vm,q>) output;
S33:Reducer extends:Reducer wrappers receive the output after being handled through Mapper wrappersAfterwards, key assignments k is newly exported according to identicalmTraveled through, and the key assignments after all traversals Reducer is passed to, while Reducer wrappers trace to the source persistent storage map informationFor each Reducer Export (ko,vo), Reducer wrappers trace to the source map informationWith Reducer outputsPassed after combination Pass RecordWriter wrappers;
S34:RecordWriter extends:RecordWriter wrappers are the letter of tracing to the source after the processing of Reducer wrappers BreathIt is each output (k as input, and by RecordWritero,vo) one unique mark of generation P, last RecordWriter wrappers storage reduce trace to the source information
Described mark storage of tracing to the source includes following sub-step:
S41:Map is traced to the source storage, and the information of tracing to the source that map processes are produced is stored, and passes through the file of input data item Name and offset generate unique mark q, and generate unique association mark k according to different groupingID, with<q,kID>Form storage In map traces to the source file;
S42:Reduce is traced to the source storage, and the information of tracing to the source that reduce processes are produced is stored, and passes through input data item Filename and offset generation unique mark p, with<kID,p>Form be stored in reduce and trace to the source in file.
Described tracing includes following sub-step:
S51:Selection needs data item and the inquiry followed the trail of;
S52:The file path and offset offset according to belonging to being determined the data item, is carried out using backtrace methods Tracing;
S53:According to the naming rule between destination file and file of tracing to the source, it is determined that the filename file that traces to the source to be inquired about, If the file of tracing to the source currently to be inquired about is that reduce traces to the source, S54 is transferred to;If map traces to the source, then S55 is transferred to;Otherwise, Expression has tracked source, is transferred to S56;
S54:Reduce is read according to filename file to trace to the source file, and scans by the way of binary search every a line, First attribute that reading is often gone is pos, and searches the pos equal with incoming skew numerical quantity, then reads pos places Second capable attribute is provenanceID, and is transferred to S53 recursive calls backtrace (file, provenanceID);
S55:Map is read according to filename file to trace to the source file, using binary search mode, reads each row of data, and according to It is secondary to be divided into lineId, fileId, position, lineId equal with incoming skew numerical quantity row is then searched, The filename of input is inquired about according to fileId and file and file is set to, S53 recursive calls backtrace is finally transferred to (file, position);
S56:Go to the step to show to have tracked source, direct export file name and input data, until all Data item follow the trail of finish, execution terminates.
The beneficial effects of the invention are as follows:Provided for existing big data model analysis platform a kind of effective, correct Data source tracing method, the method overcome the problem of conventional method is not applied under big data platform, and be file foundation of tracing to the source Index, reduces I/O operation, improves inquiry velocity.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the fine granularity source tracing method based on big data model platform;
Fig. 2 is model construction first pass figure of tracing to the source;
Fig. 3 is model construction second flow chart of tracing to the source;
Fig. 4 is mark storage graph of a relation of tracing to the source;
Fig. 5 is fine-grained data tracing flow chart.
Embodiment
Technical scheme is described in further detail with reference to specific embodiment, but protection scope of the present invention is not It is confined to as described below.
Embodiment 1
Such as Fig. 1, a kind of fine-grained data source tracing method based on big data model platform comprises the following steps:
S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines, Input, output and big data in Main Analysis workflow handle the data handling procedure of framework;
S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;
S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;
S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;
S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.
Described model workflow is the workflow being made up of in Hadoop platform controlling stream node and action node, and Explained and performed by Hadoop Oozie workflow engines server.
Described fine granularity genetic definition, by giving the workflow W under a big data platform, and with a four-tuple W={ I, O, M, P } is expressed as, wherein I represents the input set I={ i of the workflow1,i2...in, wherein i represents input file In single input item;O represents the output collection O={ o of workflow1,o2...on, wherein o represents single output item in output file; M represents the Models Sets M={ m in workflow1,m2...mn, wherein m represents arbitrary model in workflow;P represents the thin of workflow Granularity data is traced to the source operation.
Such as Fig. 2,3, the information capture of tracing to the source by ecosystem big data model treatment framework extension, and is added and traced back The generation of source information and transmission function, make the information of tracing to the source produced in model implementation procedure be passed in workflow processing model Pass.
Such as Fig. 4, the mark storage of tracing to the source is come between input and output item to each model by using middle mark Association is set up, and the association for information of tracing to the source is stored on HDFS in the form of a file.
Such as Fig. 5, the tracing, based on storage file of tracing to the source, and in a kind of recursive mode to any result data Item produces its all correlated inputs to follow the trail of, and the granularity of tracing is based on row DBMS.
Control node in described model workflow does not produce influence to data, therefore Main Analysis action node is such as MapReduce, Hive, Spark etc..By taking MapReduce as an example, MapReduce frameworks mainly include two stages:
The Map stages:If map functions are M, input data set is I, for each element i in I, it can produce 0 or Multiple output elements, i.e.,
M (I)=∪i∈IM({i})
The Reduce stages:If reduce functions are R, input data set is I, wherein each element is a key-value pair, then R It is output as 0 or the multiple elements produced for the packet of each same keys in input I, it is assumed that use k1,k2...knTable Show different keys, G in IjIt is to be equal to k by inputting all keys in IjKey-value pair composition, i.e.,
R (I)=∪j∈[1,n]R({Gj})
Described fine granularity genetic definition includes following sub-step:
S21:Single model is traced to the source expression:Assuming that the model conversion of any one in workflow is expressed as T, a conversion is given Example T (I)=O, input set is I, single output element o ∈ O, fine granularity trace to the source be required to determine that those are contributed to it is defeated Go out element o input subset
S22:Workflow is traced to the source expression:Workflow, which is traced to the source, is traced back to being related to all model conversions in work at present stream Source, and traced to the source with recursive fashion representation according to single conversion, P is used in tracing to the source for workflow WwRepresent, it is any in workflow W Single tracing to the source for original e is expressed as Pw(e), if e is initial input element, i.e. e ∈ Ik, then Pw(e)={ e }, otherwise assumes T It is used as output e conversion, PT(e) traced to the source as e one-level, recurrence is expressed as
Described information capture of tracing to the source includes following sub-step:
S31:RecordReader extends:RecordReader wrapper is by the output key assignments (k produced every timei,vi) and Corresponding unique mark q is combined into (ki,<vi,q>) Mapper is passed to together;
S32:Mapper extends:Mapper wrappers are by forward data (ki,<vi,q>) as input, and it is decomposed, will Export key assignments (ki,vi) pass to bottom map functions processing, obtain new output key assignments (km,vm), Mapper wrappers will New output key assignments (km,vm) and unique mark q is together as result and is encapsulated as (km,<vm,q>) output;
S33:Reducer extends:Reducer wrappers receive the output after being handled through Mapper wrappersAfterwards, key assignments k is newly exported according to identicalmTraveled through, and the key assignments after all traversals Reducer is passed to, while Reducer wrappers trace to the source persistent storage map informationFor each Reducer Export (ko,vo), Reducer wrappers trace to the source map informationWith Reducer outputsPassed after combination Pass RecordWriter wrappers;
S34:RecordWriter extends:RecordWriter wrappers are the letter of tracing to the source after the processing of Reducer wrappers BreathIt is each output (k as input, and by RecordWritero,vo) one unique mark of generation P, last RecordWriter wrappers storage reduce trace to the source information
Described mark storage of tracing to the source includes following sub-step:
S41:Map is traced to the source storage, and the information of tracing to the source that map processes are produced is stored, and passes through the file of input data item Name and offset generate unique mark q, and generate unique association mark k according to different groupingID, with<q,kID>Form storage In map traces to the source file;
S42:Reduce is traced to the source storage, and the information of tracing to the source that reduce processes are produced is stored, and passes through input data item Filename and offset generation unique mark p, with<kID,p>Form be stored in reduce and trace to the source in file.
Described tracing includes following sub-step:
S51:Selection needs data item and the inquiry followed the trail of;
S52:The file path and offset offset according to belonging to being determined the data item, is carried out using backtrace methods Tracing;
S53:According to the naming rule between destination file and file of tracing to the source, it is determined that the filename file that traces to the source to be inquired about, If the file of tracing to the source currently to be inquired about is that reduce traces to the source, S54 is transferred to;If map traces to the source, then S55 is transferred to;Otherwise, Expression has tracked source, is transferred to S56;
S54:Reduce is read according to filename file to trace to the source file, and scans by the way of binary search every a line, First attribute that reading is often gone is pos, and searches the pos equal with incoming skew numerical quantity, then reads pos places Second capable attribute is provenanceID, and is transferred to S53 recursive calls backtrace (file, provenanceID);
S55:Map is read according to filename file to trace to the source file, using binary search mode, reads each row of data, and according to It is secondary to be divided into lineId, fileId, position, lineId equal with incoming skew numerical quantity row is then searched, The filename of input is inquired about according to fileId and file and file is set to, S53 recursive calls backtrace is finally transferred to (file, position);
S56:Go to the step to show to have tracked source, direct export file name and input data, until all Data item follow the trail of finish, execution terminates.
Described above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein Form, is not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and environment, and can be at this In the text contemplated scope, it is modified by the technology or knowledge of above-mentioned teaching or association area.And those skilled in the art are entered Capable change and change does not depart from the spirit and scope of the present invention, then all should appended claims of the present invention protection domain It is interior.

Claims (10)

1. the fine-grained data source tracing method under a kind of model platform based on big data, it is characterised in that it comprises the following steps:
S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines;
S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;
S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;
S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;
S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.
2. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, described model workflow is the workflow being made up of in Hadoop platform controlling stream node and action node, and by Hadoop Oozie workflow engines server, which is explained, to be performed.
3. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, described fine granularity genetic definition, by giving the workflow W under a big data platform, and with a four-tuple table W={ I, O, M, P } is shown as, wherein I represents the input set of the workflow, and O represents the output collection of workflow, and M is represented in workflow Models Sets, P represents that the fine-grained data of workflow is traced to the source operation.
4. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, the information capture of tracing to the source, by model treatment framework extension, and adds generation and the transmission function of information of tracing to the source, The information of tracing to the source produced in model implementation procedure is set to be transmitted in workflow processing model.
5. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, the mark storage of tracing to the source, carrying out foundation between input and output item to each model by using middle mark associates, And be stored in the association for information of tracing to the source on HDFS in the form of a file.
6. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, the tracing, production is followed the trail of to any result data item based on storage file of tracing to the source, and in a kind of recursive mode Its raw all correlated inputs, the granularity of tracing is based on row DBMS.
7. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, described fine granularity genetic definition includes following sub-step:
S21:Single model is traced to the source expression:Assuming that the model conversion of any one in workflow is expressed as T, a transform instances are given T (I)=O, input set is I, single output element o ∈ O, and fine granularity, which is traced to the source, to be required to determine that those contribute to output member Plain o input subset
S22:Workflow is traced to the source expression:Workflow, which is traced to the source, is traced to the source being related to all model conversions in work at present stream, and Traced to the source with recursive fashion representation according to single conversion.
8. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, described information capture of tracing to the source includes following sub-step:
S31:RecordReader extends:RecordReader wrapper by produce every time output key assignments and it is corresponding only One mark passes to Mapper together;
S32:Mapper extends:Mapper wrappers decompose forward data as input, and to it, and output key assignments is passed to The map functions processing of bottom, obtains new output key assignments, Mapper wrappers will newly export key assignments and unique mark together as As a result and be encapsulated as output;
S33:Reducer extends:Reducer wrappers are received after the output after being handled through Mapper wrappers, according to identical New output key assignments is traveled through, and the key assignments after all traversals is passed to Reducer, while Reducer wrappers will be lasting Change storage map to trace to the source information, exported for each Reducer, map traced to the source Reducer wrappers into information and Reducer is exported RecordWriter wrappers are passed to after combination;
S34:RecordWriter extends:RecordWriter wrappers make the information of tracing to the source after the processing of Reducer wrappers To input, and it is one unique mark p of each output generation by RecordWriter, last RecordWriter wrappers are deposited Storage reduce traces to the source information.
9. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature It is, described mark storage of tracing to the source includes following sub-step:
S41:Map is traced to the source storage, and the information of tracing to the source that map processes are produced is stored, by the filename of input data with Offset generates unique mark q, and generates unique association mark k according to different groupingID, with<q,kID>Form be stored in map Trace to the source in file;
S42:Reduce is traced to the source storage, and the information of tracing to the source that reduce processes are produced is stored, and passes through the text of input data item Part name and offset generate unique mark p, with<kID,p>Form be stored in reduce and trace to the source in file.
10. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, described Tracing includes following sub-step:
S51:Selection needs data item and the inquiry followed the trail of;
S52:File and offset according to belonging to being determined the data item, tracing is carried out using backtrace methods;
S53:According to the naming rule between destination file and file of tracing to the source, it is determined that the filename of tracing to the source to be inquired about, if currently The file of tracing to the source to be inquired about is that reduce traces to the source, then is transferred to S54;If map traces to the source, then S55 is transferred to;Otherwise, represent Source is tracked, S56 is transferred to;
S54:Reduce is read according to filename to trace to the source file, and scans every a line using binary search by the way of, is read and is often gone First attribute be pos, and search and the incoming pos that to offset numerical quantity equal, then read the pos is expert at second Individual attribute is provenanceID, and is transferred to S53 recursive calls backtrace (file, provenanceID);
S55:Map is read according to filename to trace to the source file, using binary search mode, reads each row of data, and successively by its point LineId, fileId, position are segmented into, lineId equal with incoming skew numerical quantity row is then searched, according to FileId and file inquiry input filename simultaneously be set to file, be finally transferred to S53 recursive calls backtrace (file, position);
S56:Go to the step to show to have tracked source, direct export file name and input data, until all numbers Follow the trail of and finish according to item, execution terminates.
CN201710385468.0A 2017-05-26 2017-05-26 A kind of fine-grained data source tracing method under the model platform based on big data Pending CN107239523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710385468.0A CN107239523A (en) 2017-05-26 2017-05-26 A kind of fine-grained data source tracing method under the model platform based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710385468.0A CN107239523A (en) 2017-05-26 2017-05-26 A kind of fine-grained data source tracing method under the model platform based on big data

Publications (1)

Publication Number Publication Date
CN107239523A true CN107239523A (en) 2017-10-10

Family

ID=59985232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710385468.0A Pending CN107239523A (en) 2017-05-26 2017-05-26 A kind of fine-grained data source tracing method under the model platform based on big data

Country Status (1)

Country Link
CN (1) CN107239523A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484616A (en) * 2014-12-03 2015-04-01 浪潮电子信息产业股份有限公司 Privacy protection method under MapReduce data processing framework
US20160012153A1 (en) * 2014-07-08 2016-01-14 Jpmorgan Chase Bank, N.A. Capturing run-time metadata
CN105721883A (en) * 2014-12-05 2016-06-29 华中科技大学 Video sharing method and system in cloud storage system based on source tracing information
CN106055676A (en) * 2016-06-03 2016-10-26 电子科技大学 Data source tracing method and system based on big data model analysis platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160012153A1 (en) * 2014-07-08 2016-01-14 Jpmorgan Chase Bank, N.A. Capturing run-time metadata
CN104484616A (en) * 2014-12-03 2015-04-01 浪潮电子信息产业股份有限公司 Privacy protection method under MapReduce data processing framework
CN105721883A (en) * 2014-12-05 2016-06-29 华中科技大学 Video sharing method and system in cloud storage system based on source tracing information
CN106055676A (en) * 2016-06-03 2016-10-26 电子科技大学 Data source tracing method and system based on big data model analysis platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ROBERT IKEDA等: "Provenance for generalized map and reduce workflows", 《CIDR》 *
张雄等: "面向飞行器设计领域的溯源实现", 《微电子学与计算机》 *

Similar Documents

Publication Publication Date Title
Wadoux et al. A note on knowledge discovery and machine learning in digital soil mapping
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN107944629A (en) A kind of recommendation method and device based on heterogeneous information network representation
CN109885782B (en) Ecological environment space big data integration method
CN105706078A (en) Automatic definition of entity collections
US9990403B2 (en) System and a method for reasoning and running continuous queries over data streams
US20210286778A1 (en) Automatic drift detection and handling
US11586838B2 (en) End-to-end fuzzy entity matching
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN108710662A (en) Language transfer method and device, storage medium, data query system and method
Ahsaan et al. Big data analytics: challenges and technologies
US11698907B2 (en) System and method for processing of events
CN113254671B (en) Atlas optimization method, device, equipment and medium based on query analysis
CN105790967A (en) Weblog processing method and device
JP2013003715A (en) Trace information management device, management method, and program
CN107239523A (en) A kind of fine-grained data source tracing method under the model platform based on big data
CN103761298A (en) Distributed-architecture-based entity matching method
US9195940B2 (en) Jabba-type override for correcting or improving output of a model
US20220284309A1 (en) Aligning knowledge graphs using subgraph typing
US11645350B2 (en) System and method for searching billers with service area popularity model and machine learning
Bertrand et al. A novel multi-perspective trace clustering technique for IoT-enhanced processes: a case study in smart manufacturing
Jiang Research and practice of big data analysis process based on hadoop framework
Scholtus A generalized Fellegi-Holt paradigm for automatic error localization
CN105808745B (en) A kind of data retrieval method and server
CN111143791B (en) Downloaded file tracing method and system based on HashMap

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171010