CN107239523A - A kind of fine-grained data source tracing method under the model platform based on big data - Google Patents
A kind of fine-grained data source tracing method under the model platform based on big data Download PDFInfo
- Publication number
- CN107239523A CN107239523A CN201710385468.0A CN201710385468A CN107239523A CN 107239523 A CN107239523 A CN 107239523A CN 201710385468 A CN201710385468 A CN 201710385468A CN 107239523 A CN107239523 A CN 107239523A
- Authority
- CN
- China
- Prior art keywords
- source
- tracing
- data
- fine
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the fine-grained data source tracing method under a kind of model platform based on big data, a kind of fine-grained data source tracing method is constructed for big data model platform, for solve fine granularity under big data platform trace to the source data dependence distinguish problem.This method comprises the following steps:S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines;S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.Strong applicability of the present invention, to trace to the source, file sets up index, reduces I/O operation, improves inquiry velocity.
Description
Technical field
The present invention relates to a kind of data source tracing method, the fine-grained data under especially a kind of model platform based on big data
Source tracing method.
Background technology
In recent years with the development of computer and mobile Internet, various information are in explosive growth, these information bases
Originally two classes are segmented into, a class is original logging data, pass through dry-cure if another kind of by these data and derive from
Data.But the general often result data for being exposed to user, these data for the user, its processing procedure or
Say for confidence level it is unknown, and sometimes result data and initial data do not have any relation, this allows for user
It must go to be concerned about the source of result data, therefore generate data tracing technology.
It is description to data origin and data generating procedure that data, which are traced to the source, and these information play important at many aspects
Effect, such as tune-up data and conversion, audit, the quality and degree of belief of assessing data and realize to the access controls of data
In terms of.Data are traced to the source to be divided into coarseness and trace to the source and traced to the source with fine granularity, are traced to the source aspect in fine granularity, studies in China is relatively
It is few.
Traditional fine-grained data source tracing method is concentrated mainly on database field, and its solution is by increasing mark
Field carrys out the processing communication process of each single item in database of record, and under big data platform, either source data or result
Data, are stored on HDFS, it is impossible to directly each input data is annotated.Therefore the present invention proposes a kind of for big
The fine-grained data source tracing method of data model platform.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of fine granularity based on big data model platform
Data source tracing method, can solve the problem that the data dependence during data are traced to the source under big data model platform distinguishes problem.
The purpose of the present invention is achieved through the following technical solutions, a kind of fine granularity based on big data model platform
Data source tracing method, comprises the following steps:
S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines,
Input, output and big data in Main Analysis workflow handle the data handling procedure of framework;
S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;
S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;
S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;
S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.
Described model workflow is the workflow being made up of in Hadoop platform controlling stream node and action node, and
Explained and performed by Hadoop Oozie workflow engines server.
Described fine granularity genetic definition, by giving the workflow W under a big data platform, and with a four-tuple
W={ I, O, M, P } is expressed as, wherein I represents the input set I={ i of the workflow1,i2...in, wherein i represents input file
In single input item;O represents the output collection O={ o of workflow1,o2...on, wherein o represents single output item in output file;
M represents the Models Sets M={ m in workflow1,m2...mn, wherein m represents arbitrary model in workflow;P represents the thin of workflow
Granularity data is traced to the source operation.
The information capture of tracing to the source, by model treatment framework extension, and adds generation and the transmission work(of information of tracing to the source
Can, the information of tracing to the source produced in model implementation procedure is transmitted in workflow processing model.
The mark storage of tracing to the source, sets up between input and output item to each model and closes by using middle mark
Connection, and the association for information of tracing to the source is stored on HDFS in the form of a file.
The tracing, is chased after based on storage file of tracing to the source, and in a kind of recursive mode to any result data item
Track produces its all correlated inputs, and the granularity of tracing is based on row DBMS.
Control node in described model workflow does not produce influence to data, therefore Main Analysis action node is such as
MapReduce, Hive, Spark etc..
Described fine granularity genetic definition includes following sub-step:
S21:Single model is traced to the source expression:Assuming that the model conversion of any one in workflow is expressed as T, a conversion is given
Example T (I)=O, input set is I, single output element o ∈ O, fine granularity trace to the source be required to determine that those are contributed to it is defeated
Go out element o input subset
S22:Workflow is traced to the source expression:Workflow, which is traced to the source, is traced back to being related to all model conversions in work at present stream
Source, and traced to the source with recursive fashion representation according to single conversion, P is used in tracing to the source for workflow WwRepresent, it is any in workflow W
Single tracing to the source for original e is expressed as Pw(e), if e is initial input element, i.e. e ∈ Ik, then Pw(e)={ e }, otherwise assumes T
It is used as output e conversion, PT(e) traced to the source as e one-level, recurrence is expressed as
Described information capture of tracing to the source includes following sub-step:
S31:RecordReader extends:RecordReader wrapper is by the output key assignments (k produced every timei,vi) and
Corresponding unique mark q is combined into (ki,<vi,q>) Mapper is passed to together;
S32:Mapper extends:Mapper wrappers are by forward data (ki,<vi,q>) as input, and it is decomposed, will
Export key assignments (ki,vi) pass to bottom map functions processing, obtain new output key assignments (km,vm), Mapper wrappers will
New output key assignments (km,vm) and unique mark q is together as result and is encapsulated as (km,<vm,q>) output;
S33:Reducer extends:Reducer wrappers receive the output after being handled through Mapper wrappersAfterwards, key assignments k is newly exported according to identicalmTraveled through, and the key assignments after all traversals
Reducer is passed to, while Reducer wrappers trace to the source persistent storage map informationFor each Reducer
Export (ko,vo), Reducer wrappers trace to the source map informationWith Reducer outputsPassed after combination
Pass RecordWriter wrappers;
S34:RecordWriter extends:RecordWriter wrappers are the letter of tracing to the source after the processing of Reducer wrappers
BreathIt is each output (k as input, and by RecordWritero,vo) one unique mark of generation
P, last RecordWriter wrappers storage reduce trace to the source information
Described mark storage of tracing to the source includes following sub-step:
S41:Map is traced to the source storage, and the information of tracing to the source that map processes are produced is stored, and passes through the file of input data item
Name and offset generate unique mark q, and generate unique association mark k according to different groupingID, with<q,kID>Form storage
In map traces to the source file;
S42:Reduce is traced to the source storage, and the information of tracing to the source that reduce processes are produced is stored, and passes through input data item
Filename and offset generation unique mark p, with<kID,p>Form be stored in reduce and trace to the source in file.
Described tracing includes following sub-step:
S51:Selection needs data item and the inquiry followed the trail of;
S52:The file path and offset offset according to belonging to being determined the data item, is carried out using backtrace methods
Tracing;
S53:According to the naming rule between destination file and file of tracing to the source, it is determined that the filename file that traces to the source to be inquired about,
If the file of tracing to the source currently to be inquired about is that reduce traces to the source, S54 is transferred to;If map traces to the source, then S55 is transferred to;Otherwise,
Expression has tracked source, is transferred to S56;
S54:Reduce is read according to filename file to trace to the source file, and scans by the way of binary search every a line,
First attribute that reading is often gone is pos, and searches the pos equal with incoming skew numerical quantity, then reads pos places
Second capable attribute is provenanceID, and is transferred to S53 recursive calls backtrace (file, provenanceID);
S55:Map is read according to filename file to trace to the source file, using binary search mode, reads each row of data, and according to
It is secondary to be divided into lineId, fileId, position, lineId equal with incoming skew numerical quantity row is then searched,
The filename of input is inquired about according to fileId and file and file is set to, S53 recursive calls backtrace is finally transferred to
(file, position);
S56:Go to the step to show to have tracked source, direct export file name and input data, until all
Data item follow the trail of finish, execution terminates.
The beneficial effects of the invention are as follows:Provided for existing big data model analysis platform a kind of effective, correct
Data source tracing method, the method overcome the problem of conventional method is not applied under big data platform, and be file foundation of tracing to the source
Index, reduces I/O operation, improves inquiry velocity.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the fine granularity source tracing method based on big data model platform;
Fig. 2 is model construction first pass figure of tracing to the source;
Fig. 3 is model construction second flow chart of tracing to the source;
Fig. 4 is mark storage graph of a relation of tracing to the source;
Fig. 5 is fine-grained data tracing flow chart.
Embodiment
Technical scheme is described in further detail with reference to specific embodiment, but protection scope of the present invention is not
It is confined to as described below.
Embodiment 1
Such as Fig. 1, a kind of fine-grained data source tracing method based on big data model platform comprises the following steps:
S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines,
Input, output and big data in Main Analysis workflow handle the data handling procedure of framework;
S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;
S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;
S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;
S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.
Described model workflow is the workflow being made up of in Hadoop platform controlling stream node and action node, and
Explained and performed by Hadoop Oozie workflow engines server.
Described fine granularity genetic definition, by giving the workflow W under a big data platform, and with a four-tuple
W={ I, O, M, P } is expressed as, wherein I represents the input set I={ i of the workflow1,i2...in, wherein i represents input file
In single input item;O represents the output collection O={ o of workflow1,o2...on, wherein o represents single output item in output file;
M represents the Models Sets M={ m in workflow1,m2...mn, wherein m represents arbitrary model in workflow;P represents the thin of workflow
Granularity data is traced to the source operation.
Such as Fig. 2,3, the information capture of tracing to the source by ecosystem big data model treatment framework extension, and is added and traced back
The generation of source information and transmission function, make the information of tracing to the source produced in model implementation procedure be passed in workflow processing model
Pass.
Such as Fig. 4, the mark storage of tracing to the source is come between input and output item to each model by using middle mark
Association is set up, and the association for information of tracing to the source is stored on HDFS in the form of a file.
Such as Fig. 5, the tracing, based on storage file of tracing to the source, and in a kind of recursive mode to any result data
Item produces its all correlated inputs to follow the trail of, and the granularity of tracing is based on row DBMS.
Control node in described model workflow does not produce influence to data, therefore Main Analysis action node is such as
MapReduce, Hive, Spark etc..By taking MapReduce as an example, MapReduce frameworks mainly include two stages:
The Map stages:If map functions are M, input data set is I, for each element i in I, it can produce 0 or
Multiple output elements, i.e.,
M (I)=∪i∈IM({i})
The Reduce stages:If reduce functions are R, input data set is I, wherein each element is a key-value pair, then R
It is output as 0 or the multiple elements produced for the packet of each same keys in input I, it is assumed that use k1,k2...knTable
Show different keys, G in IjIt is to be equal to k by inputting all keys in IjKey-value pair composition, i.e.,
R (I)=∪j∈[1,n]R({Gj})
Described fine granularity genetic definition includes following sub-step:
S21:Single model is traced to the source expression:Assuming that the model conversion of any one in workflow is expressed as T, a conversion is given
Example T (I)=O, input set is I, single output element o ∈ O, fine granularity trace to the source be required to determine that those are contributed to it is defeated
Go out element o input subset
S22:Workflow is traced to the source expression:Workflow, which is traced to the source, is traced back to being related to all model conversions in work at present stream
Source, and traced to the source with recursive fashion representation according to single conversion, P is used in tracing to the source for workflow WwRepresent, it is any in workflow W
Single tracing to the source for original e is expressed as Pw(e), if e is initial input element, i.e. e ∈ Ik, then Pw(e)={ e }, otherwise assumes T
It is used as output e conversion, PT(e) traced to the source as e one-level, recurrence is expressed as
Described information capture of tracing to the source includes following sub-step:
S31:RecordReader extends:RecordReader wrapper is by the output key assignments (k produced every timei,vi) and
Corresponding unique mark q is combined into (ki,<vi,q>) Mapper is passed to together;
S32:Mapper extends:Mapper wrappers are by forward data (ki,<vi,q>) as input, and it is decomposed, will
Export key assignments (ki,vi) pass to bottom map functions processing, obtain new output key assignments (km,vm), Mapper wrappers will
New output key assignments (km,vm) and unique mark q is together as result and is encapsulated as (km,<vm,q>) output;
S33:Reducer extends:Reducer wrappers receive the output after being handled through Mapper wrappersAfterwards, key assignments k is newly exported according to identicalmTraveled through, and the key assignments after all traversals
Reducer is passed to, while Reducer wrappers trace to the source persistent storage map informationFor each Reducer
Export (ko,vo), Reducer wrappers trace to the source map informationWith Reducer outputsPassed after combination
Pass RecordWriter wrappers;
S34:RecordWriter extends:RecordWriter wrappers are the letter of tracing to the source after the processing of Reducer wrappers
BreathIt is each output (k as input, and by RecordWritero,vo) one unique mark of generation
P, last RecordWriter wrappers storage reduce trace to the source information
Described mark storage of tracing to the source includes following sub-step:
S41:Map is traced to the source storage, and the information of tracing to the source that map processes are produced is stored, and passes through the file of input data item
Name and offset generate unique mark q, and generate unique association mark k according to different groupingID, with<q,kID>Form storage
In map traces to the source file;
S42:Reduce is traced to the source storage, and the information of tracing to the source that reduce processes are produced is stored, and passes through input data item
Filename and offset generation unique mark p, with<kID,p>Form be stored in reduce and trace to the source in file.
Described tracing includes following sub-step:
S51:Selection needs data item and the inquiry followed the trail of;
S52:The file path and offset offset according to belonging to being determined the data item, is carried out using backtrace methods
Tracing;
S53:According to the naming rule between destination file and file of tracing to the source, it is determined that the filename file that traces to the source to be inquired about,
If the file of tracing to the source currently to be inquired about is that reduce traces to the source, S54 is transferred to;If map traces to the source, then S55 is transferred to;Otherwise,
Expression has tracked source, is transferred to S56;
S54:Reduce is read according to filename file to trace to the source file, and scans by the way of binary search every a line,
First attribute that reading is often gone is pos, and searches the pos equal with incoming skew numerical quantity, then reads pos places
Second capable attribute is provenanceID, and is transferred to S53 recursive calls backtrace (file, provenanceID);
S55:Map is read according to filename file to trace to the source file, using binary search mode, reads each row of data, and according to
It is secondary to be divided into lineId, fileId, position, lineId equal with incoming skew numerical quantity row is then searched,
The filename of input is inquired about according to fileId and file and file is set to, S53 recursive calls backtrace is finally transferred to
(file, position);
S56:Go to the step to show to have tracked source, direct export file name and input data, until all
Data item follow the trail of finish, execution terminates.
Described above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein
Form, is not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and environment, and can be at this
In the text contemplated scope, it is modified by the technology or knowledge of above-mentioned teaching or association area.And those skilled in the art are entered
Capable change and change does not depart from the spirit and scope of the present invention, then all should appended claims of the present invention protection domain
It is interior.
Claims (10)
1. the fine-grained data source tracing method under a kind of model platform based on big data, it is characterised in that it comprises the following steps:
S1:Model workflow analysis, the analysis of the model workflow constituted under Hadoop platform to Oozie engines;
S2:Fine granularity genetic definition, represents that the fine-grained data of workflow is traced to the source in a kind of recursive form;
S3:Trace to the source information capture, in model implementation procedure, dynamically produce and obtain information of tracing to the source;
S4:Mark of tracing to the source is stored, and the information of tracing to the source to capture is stored with correlation form on HDFS;
S5:Tracing, for reviewing the source input data of the fine-grained data produced in result data files.
2. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, described model workflow is the workflow being made up of in Hadoop platform controlling stream node and action node, and by
Hadoop Oozie workflow engines server, which is explained, to be performed.
3. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, described fine granularity genetic definition, by giving the workflow W under a big data platform, and with a four-tuple table
W={ I, O, M, P } is shown as, wherein I represents the input set of the workflow, and O represents the output collection of workflow, and M is represented in workflow
Models Sets, P represents that the fine-grained data of workflow is traced to the source operation.
4. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, the information capture of tracing to the source, by model treatment framework extension, and adds generation and the transmission function of information of tracing to the source,
The information of tracing to the source produced in model implementation procedure is set to be transmitted in workflow processing model.
5. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, the mark storage of tracing to the source, carrying out foundation between input and output item to each model by using middle mark associates,
And be stored in the association for information of tracing to the source on HDFS in the form of a file.
6. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, the tracing, production is followed the trail of to any result data item based on storage file of tracing to the source, and in a kind of recursive mode
Its raw all correlated inputs, the granularity of tracing is based on row DBMS.
7. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, described fine granularity genetic definition includes following sub-step:
S21:Single model is traced to the source expression:Assuming that the model conversion of any one in workflow is expressed as T, a transform instances are given
T (I)=O, input set is I, single output element o ∈ O, and fine granularity, which is traced to the source, to be required to determine that those contribute to output member
Plain o input subset
S22:Workflow is traced to the source expression:Workflow, which is traced to the source, is traced to the source being related to all model conversions in work at present stream, and
Traced to the source with recursive fashion representation according to single conversion.
8. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, described information capture of tracing to the source includes following sub-step:
S31:RecordReader extends:RecordReader wrapper by produce every time output key assignments and it is corresponding only
One mark passes to Mapper together;
S32:Mapper extends:Mapper wrappers decompose forward data as input, and to it, and output key assignments is passed to
The map functions processing of bottom, obtains new output key assignments, Mapper wrappers will newly export key assignments and unique mark together as
As a result and be encapsulated as output;
S33:Reducer extends:Reducer wrappers are received after the output after being handled through Mapper wrappers, according to identical
New output key assignments is traveled through, and the key assignments after all traversals is passed to Reducer, while Reducer wrappers will be lasting
Change storage map to trace to the source information, exported for each Reducer, map traced to the source Reducer wrappers into information and Reducer is exported
RecordWriter wrappers are passed to after combination;
S34:RecordWriter extends:RecordWriter wrappers make the information of tracing to the source after the processing of Reducer wrappers
To input, and it is one unique mark p of each output generation by RecordWriter, last RecordWriter wrappers are deposited
Storage reduce traces to the source information.
9. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, its feature
It is, described mark storage of tracing to the source includes following sub-step:
S41:Map is traced to the source storage, and the information of tracing to the source that map processes are produced is stored, by the filename of input data with
Offset generates unique mark q, and generates unique association mark k according to different groupingID, with<q,kID>Form be stored in map
Trace to the source in file;
S42:Reduce is traced to the source storage, and the information of tracing to the source that reduce processes are produced is stored, and passes through the text of input data item
Part name and offset generate unique mark p, with<kID,p>Form be stored in reduce and trace to the source in file.
10. the fine-grained data source tracing method under a kind of model platform based on big data according to claim 1, described
Tracing includes following sub-step:
S51:Selection needs data item and the inquiry followed the trail of;
S52:File and offset according to belonging to being determined the data item, tracing is carried out using backtrace methods;
S53:According to the naming rule between destination file and file of tracing to the source, it is determined that the filename of tracing to the source to be inquired about, if currently
The file of tracing to the source to be inquired about is that reduce traces to the source, then is transferred to S54;If map traces to the source, then S55 is transferred to;Otherwise, represent
Source is tracked, S56 is transferred to;
S54:Reduce is read according to filename to trace to the source file, and scans every a line using binary search by the way of, is read and is often gone
First attribute be pos, and search and the incoming pos that to offset numerical quantity equal, then read the pos is expert at second
Individual attribute is provenanceID, and is transferred to S53 recursive calls backtrace (file, provenanceID);
S55:Map is read according to filename to trace to the source file, using binary search mode, reads each row of data, and successively by its point
LineId, fileId, position are segmented into, lineId equal with incoming skew numerical quantity row is then searched, according to
FileId and file inquiry input filename simultaneously be set to file, be finally transferred to S53 recursive calls backtrace (file,
position);
S56:Go to the step to show to have tracked source, direct export file name and input data, until all numbers
Follow the trail of and finish according to item, execution terminates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710385468.0A CN107239523A (en) | 2017-05-26 | 2017-05-26 | A kind of fine-grained data source tracing method under the model platform based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710385468.0A CN107239523A (en) | 2017-05-26 | 2017-05-26 | A kind of fine-grained data source tracing method under the model platform based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107239523A true CN107239523A (en) | 2017-10-10 |
Family
ID=59985232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710385468.0A Pending CN107239523A (en) | 2017-05-26 | 2017-05-26 | A kind of fine-grained data source tracing method under the model platform based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107239523A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104484616A (en) * | 2014-12-03 | 2015-04-01 | 浪潮电子信息产业股份有限公司 | Privacy protection method under MapReduce data processing framework |
US20160012153A1 (en) * | 2014-07-08 | 2016-01-14 | Jpmorgan Chase Bank, N.A. | Capturing run-time metadata |
CN105721883A (en) * | 2014-12-05 | 2016-06-29 | 华中科技大学 | Video sharing method and system in cloud storage system based on source tracing information |
CN106055676A (en) * | 2016-06-03 | 2016-10-26 | 电子科技大学 | Data source tracing method and system based on big data model analysis platform |
-
2017
- 2017-05-26 CN CN201710385468.0A patent/CN107239523A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160012153A1 (en) * | 2014-07-08 | 2016-01-14 | Jpmorgan Chase Bank, N.A. | Capturing run-time metadata |
CN104484616A (en) * | 2014-12-03 | 2015-04-01 | 浪潮电子信息产业股份有限公司 | Privacy protection method under MapReduce data processing framework |
CN105721883A (en) * | 2014-12-05 | 2016-06-29 | 华中科技大学 | Video sharing method and system in cloud storage system based on source tracing information |
CN106055676A (en) * | 2016-06-03 | 2016-10-26 | 电子科技大学 | Data source tracing method and system based on big data model analysis platform |
Non-Patent Citations (2)
Title |
---|
ROBERT IKEDA等: "Provenance for generalized map and reduce workflows", 《CIDR》 * |
张雄等: "面向飞行器设计领域的溯源实现", 《微电子学与计算机》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wadoux et al. | A note on knowledge discovery and machine learning in digital soil mapping | |
CN108446540B (en) | Program code plagiarism type detection method and system based on source code multi-label graph neural network | |
CN107944629A (en) | A kind of recommendation method and device based on heterogeneous information network representation | |
CN109885782B (en) | Ecological environment space big data integration method | |
CN105706078A (en) | Automatic definition of entity collections | |
US9990403B2 (en) | System and a method for reasoning and running continuous queries over data streams | |
US20210286778A1 (en) | Automatic drift detection and handling | |
US11586838B2 (en) | End-to-end fuzzy entity matching | |
CN113254630B (en) | Domain knowledge map recommendation method for global comprehensive observation results | |
CN108710662A (en) | Language transfer method and device, storage medium, data query system and method | |
Ahsaan et al. | Big data analytics: challenges and technologies | |
US11698907B2 (en) | System and method for processing of events | |
CN113254671B (en) | Atlas optimization method, device, equipment and medium based on query analysis | |
CN105790967A (en) | Weblog processing method and device | |
JP2013003715A (en) | Trace information management device, management method, and program | |
CN107239523A (en) | A kind of fine-grained data source tracing method under the model platform based on big data | |
CN103761298A (en) | Distributed-architecture-based entity matching method | |
US9195940B2 (en) | Jabba-type override for correcting or improving output of a model | |
US20220284309A1 (en) | Aligning knowledge graphs using subgraph typing | |
US11645350B2 (en) | System and method for searching billers with service area popularity model and machine learning | |
Bertrand et al. | A novel multi-perspective trace clustering technique for IoT-enhanced processes: a case study in smart manufacturing | |
Jiang | Research and practice of big data analysis process based on hadoop framework | |
Scholtus | A generalized Fellegi-Holt paradigm for automatic error localization | |
CN105808745B (en) | A kind of data retrieval method and server | |
CN111143791B (en) | Downloaded file tracing method and system based on HashMap |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171010 |