CN107291672A - The treating method and apparatus of tables of data - Google Patents
The treating method and apparatus of tables of data Download PDFInfo
- Publication number
- CN107291672A CN107291672A CN201610197071.4A CN201610197071A CN107291672A CN 107291672 A CN107291672 A CN 107291672A CN 201610197071 A CN201610197071 A CN 201610197071A CN 107291672 A CN107291672 A CN 107291672A
- Authority
- CN
- China
- Prior art keywords
- field
- data
- tables
- processing
- identification information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This application discloses a kind for the treatment of method and apparatus of tables of data.Wherein, this method includes:Compare the second field in the first field and the second tables of data in the first tables of data;In the case where difference occurs in the identification information for comparing the first field and the second field, the machining information of the first field and the machining information of the second field are obtained, wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to machining path, compare each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared is the logic for difference occur.Present application addresses carry out efficiency low technical problem when tables of data content is compared.
Description
Technical field
The application is related to data processing field, in particular to a kind for the treatment of method and apparatus of tables of data.
Background technology
In the prior art, it is the comparison for directly doing data content in comparison data table, after finding that content is variant,
To be looked into upwards by manually along processing link, carry out Wrong localization.
After content deltas is found, it is necessary to which artificial obtain the data for occurring difference, and edge after manually finding differences
The processing link for the data for occurring difference, one saves and looks into upwards, the comparison of link one by one, with Wrong localization.
During this, due to substantial amounts of comparison task, burdensome, and manually-operated process are manually operated, is malfunctioned
Rate is higher.
The problem of efficiency is low when being compared for above-mentioned carry out tables of data content, not yet proposes effective solution at present.
The content of the invention
The embodiment of the present application provides a kind for the treatment of method and apparatus of tables of data, at least to solve to carry out tables of data content
The problem of efficiency is low during comparison.
According to the one side of the embodiment of the present application there is provided a kind of processing method of tables of data, this method includes:Than
Compared with the second field in the first field and the second tables of data in the first tables of data;Comparing the first field and the second word
In the case that difference occurs in the identification information of section, the machining information of the first field and the machining information of the second field are obtained,
Wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to machining path, than
Compared with each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that currently compare
Processing logic is the logic for difference occur.
According to the another aspect of the embodiment of the present application, a kind of processing unit of tables of data is additionally provided, the device includes:
First comparing unit, for comparing the second field in the first field and the second tables of data in the first tables of data;Information
Acquiring unit, in the case of there is difference in the identification information for comparing the first field and the second field, obtains the
The machining information of the machining information of one field and the second field, wherein, machining information is used for the processing for recording corresponding field
Multiple processing logics in path;Second comparing unit, for according to machining path, comparing each of each corresponding field
Individual processing logic;Difference positioning unit, if inconsistent for the processing logic currently compared, it is determined that currently compare
Processing logic is the logic for difference occur.
Using above-described embodiment, the first tables of data and second tables of data identification information the first field of identical are being compared
In the case of occurring in that difference with the second field, the processing logic of first field and the second field is compared automatically, if plus
Work logic is different, then the different logic is that tables of data identification information identical field to be analyzed asking for difference occurs
Where topic.By above-described embodiment, in two tables of data should identical field when there is difference, can from
The dynamic processing logic according to corresponding field is positioned where the problem of difference occur, improves processing accuracy rate.Pass through this Shen
Please, solve and carry out the problem of efficiency is low when tables of data content is compared in the prior art, improve the place of tables of data comparison
Manage efficiency.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please is used to explain the application, does not constitute the improper restriction to the application.In accompanying drawing
In:
Fig. 1 is a kind of hardware block diagram of the terminal of the processing method of tables of data of the embodiment of the present application;
Fig. 2 is the flow chart of the processing method of the tables of data according to the embodiment of the present application;
Fig. 3 is a kind of flow chart one of the processing method of optional tables of data according to the embodiment of the present application;
Fig. 4 is a kind of flowchart 2 of the processing method of optional tables of data according to the embodiment of the present application;
Fig. 5 is the flow chart of the processing method of the tables of data applied to scene one according to the embodiment of the present application;
Fig. 6 is the flow chart of the processing method of the tables of data applied to scene two according to the embodiment of the present application;
Fig. 7 is a kind of flow chart of optional machining information for obtaining data sheet field according to the embodiment of the present application;
Fig. 8 is a kind of schematic diagram of the processing unit of optional tables of data according to the embodiment of the present application;
Fig. 9 is the schematic diagram of the processing unit of another optional tables of data according to the embodiment of the present application;
Figure 10 is a kind of network environment figure of terminal according to the embodiment of the present application.
Embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment
The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to
The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this
The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except
Here the order beyond those for illustrating or describing is implemented.In addition, term " comprising " and " having " and they
Any deformation, it is intended that covering is non-exclusive to be included, for example, containing process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
It is that ground is listed or for the intrinsic other steps of these processes, method, product or equipment or unit.
First, to the invention relates to term be explained as follows, but these data explain not to the application implement
Example causes to limit:
Online table:The table that operation system is produced, the data in online table are due to the operation or torsion of business and write number
According to storehouse.
The blood relationship of data:The data that will can be drawn into from online table, new data are formed by calculation or processing,
The link calculated between online data and new data i.e. referred to as blood relationship.
Machining path:Name for recording each processing node (i.e. procedure of processing) during being processed to data
The processing logic of link and each processing node between title, the order of processing node, processing node.Specific to this Shen
Please be in embodiment, processing refers to:Calculation or the operation of logical process are performed to the data being drawn into online table.
Processing logic:Source field, result field, filter condition and processing function for recording processing node (should
It can be logical process function to process function).
Data bore:The practical business implication of data representation.
The similarity of table:It is main to judge from attribute informations such as the source of data, the process of data, the granularities of data
Whether the field of two tables is identical, then passes through the similarity of field identical number computational chart.
The quality of table point:For weighing a table quality of data quality, principal measure information completely and the degree of reliability.
The health of table point:The health degree that table is used is weighed by the consumption of the storage of table, computing resource.
The access temperature of table:For describing the number of times that table is used within a period of time, number of times is more, and temperature is higher.
Embodiment 1
According to the embodiment of the present application, a kind of embodiment of the processing method of tables of data is additionally provided, it is necessary to illustrate,
It can be performed the step of the flow of accompanying drawing is illustrated in the computer system of such as one group computer executable instructions,
And, although logical order is shown in flow charts, but in some cases, can be with suitable different from herein
Sequence performs shown or described step.
The embodiment of the method that the embodiment of the present application one is provided can be in mobile terminal, terminal or similar fortune
Calculate in device and perform.Exemplified by running on computer terminals, Fig. 1 is a kind of place of tables of data of the embodiment of the present application
The hardware block diagram of the terminal of reason method.As shown in figure 1, terminal 10 can include one or many
(processor 102 can include but is not limited to Micro-processor MCV or can individual (one is only shown in figure) processor 102
Programmed logic device FPGA etc. processing unit), the memory 104 for data storage and for communication function
Transport module 106.It will appreciated by the skilled person that the structure shown in Fig. 1 is only signal, it is not
Structure to above-mentioned electronic installation causes to limit.For example, terminal 10 may also include it is more more than shown in Fig. 1 or
The less component of person, or with the configuration different from shown in Fig. 1.
The data that memory 104 can be used in the software program and module of storage application software, such as the embodiment of the present application
Corresponding programmed instruction/the module of processing method of table, processor 102 is stored in the software in memory 104 by operation
Program and module, so as to perform various function application and data processing, that is, realize the processing side of above-mentioned tables of data
Method.Memory 104 may include high speed random access memory, may also include nonvolatile memory, such as one or more
Magnetic storage device, flash memory or other non-volatile solid state memories.In some instances, memory 104 can
Further comprise the memory remotely located relative to processor 102, these remote memories can pass through network connection
To terminal 10.The example of above-mentioned network includes but is not limited to internet, intranet, LAN, movement
Communication network and combinations thereof.
Transmitting device 106 is used to data are received or sent via a network.Above-mentioned network instantiation may include
The wireless network that the communication providerses of terminal 10 are provided.In an example, transmitting device 106 includes one
Network adapter (Network Interface Controller, NIC), it can pass through base station and other network equipments
It is connected to be communicated with internet.In an example, transmitting device 106 can be radio frequency (Radio
Frequency, RF) module, it is used to wirelessly be communicated with internet.
Under above-mentioned running environment, this application provides the processing method of tables of data as shown in Figure 2.Fig. 2 is basis
The flow chart of the processing method of the tables of data of the embodiment of the present application.
As shown in Fig. 2 this method may include steps of:
Step S202:Compare the second field in the first field and the second tables of data in the first tables of data;
Step S204:In the case where difference occurs in the identification information for comparing the first field and the second field, the is obtained
The machining information of the machining information of one field and the second field, wherein, machining information is used for the processing for recording corresponding field
Multiple processing logics in path, the corresponding field is the first field or the second field;
Step S206:According to machining path, compare each processing logic of each corresponding field;
Step S208:If the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared is poor to occur
Different logic.
Using above-described embodiment, the field of identification information identical first and the second word in tables of data to be analyzed is compared
In the case that section occurs in that difference, the processing logic of first field and the second field is compared automatically, if processing logic is not
Together, then the different logic is the problem of difference occurs in tables of data identification information identical field to be analyzed place.
By above-described embodiment, in two tables of data should identical field when there is difference, can be automatically according to right
Answer the processing logic of field to position the problem of difference occur place, improve processing accuracy rate.By the application, solve
The problem of efficiency is low when tables of data content is compared is carried out in the prior art, improves the treatment effeciency of tables of data comparison.
Above-mentioned identification information is the information for recognizing a field, and the identification information of a field points to a field,
Such as field name, field processing logic.
Above-mentioned steps S202, compares the second field in the first field and the second tables of data in the first tables of data, can be with
Whether predetermined comparison condition is met come real by the identification information identical field existed originally in tables of data to be analyzed always
It is existing, if above-mentioned identification information identical field meets the predetermined comparison condition always, it can return and perform the step
S202, can this be to be analyzed if comparing out above-mentioned identification information identical field no longer meets the predetermined comparison condition
There is difference in identification information identical field in tables of data.Substantially, the appearance of the corresponding field is due to the mark
Know information identical field and further carried out new processing, then can be somebody's turn to do by step S204 to step S208 positioning
Identification information identical field is caused the processing logic of difference occur.
Wherein, the predetermined comparison condition in above-described embodiment can be determined, the predetermined comparison condition based on scene is compared
In can include:Field name is identical, field processing logic is identical, field metadata is identical with processing logic.
In the above-described embodiments, record has the machining path of the corresponding field in the machining information got in step S204
In processing logic, can include in the processing logic at least one following:The source field of corresponding processing node,
Result field, filter condition and processing function.
When comparing processing logic, above-mentioned source field, result field, the filter condition of processing node can be passed through
And processing function is compared, if the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared
For difference logic, so as to orient the position for difference occur.
In a kind of optional scheme, the second word in the first field and the second tables of data in comparing the first tables of data
Before section, specify information is obtained, wherein, specify information is used to specify the first field and the second field.
That is, user can specify the comparison field of aiming field in tables of data to be analyzed, by aiming field and comparison word
Section is defined as the first field and the second field in the first tables of data and the second tables of data.
By above-described embodiment, the field for needing to carry out comparing can be directly specified.The program can apply into
In row Data Migration, the data before Data Migration and after Data Migration can be monitored, to verify that Data Migration is
It is no complete, verify Data Migration it is incomplete in the case of, carry out reason of discrepancies by field blood relationship and be automatically positioned.
In another optional scheme, second in the first field and the second tables of data in comparing the first tables of data
Before field, obtain the first field identification information, using the identification information of the first field determine in the second tables of data with
First field has the second field of identical identification information.
Specifically, the identification information (such as machining information) of the aiming field of the first tables of data is obtained, it is determined that and aiming field
The matching field that identical belongs to other tables of data is (identical with the identification information of aiming field i.e. in other tables of data
The second field).
In addition to specifying and comparing field, field blood relationship, field blood relationship that can be by the aiming field of the first tables of data
Path takes out the identical matching field with the aiming field of the first tables of data, and the matching field can be located at other numbers
According in table, in that case, the machining path of aiming field and matching field is just the same, passes through above-mentioned determination
The step of with field, pulls out predetermined comparison rules (i.e. above-mentioned predetermined comparison condition).
Alternatively, identification information it is identical including:Field name is identical, or, field metadata is identical with processing logic
(same metadata is such as processed into identification information identical field using same processing logic).
In an optional embodiment, identification information includes field name, wherein, compare the in the first tables of data
The second field in one field and the second tables of data includes:Compare the first field and the second field field name whether phase
Together;If the first field is different with the field name of the second field, the mark letter of the first field and the second field is compared
There is difference in breath.
In another optional embodiment, identification information includes field metadata and processing logic, wherein, compare the
The second field in the first field and the second tables of data in one tables of data includes:Compare the first field and the second field
Whether field metadata and processing logic are identical;If the field metadata and processing logic of the first field and the second field are not
Together, then there is difference in the identification information for comparing the first field and the second field.
Below by taking two tables of data as an example, the alignments in the embodiment of the present application are described in detail with reference to Fig. 3, as shown in figure 3,
The embodiment can be achieved by the steps of:
Step S301:Log-on data compares pattern.
Step S302:Whether detecting system specifies the comparison data of aiming field.
If detecting, system has specified the comparison data of aiming field, performs step S304;If detecting, system does not refer to
The comparison data for the field that sets the goal, then perform step S303.
User can determine the comparison of aiming field a in the first tables of data A (the first field i.e. in above-described embodiment)
Field b (i.e. comparison data), the comparison field belongs to the second tables of data B, is specifying the comparison data of aiming field
In the case of, the aiming field and comparison field b are defined as the second field by system.
Step S303:Matching field is obtained according to the field blood relationship of aiming field.
Specifically, system can determine to match word with aiming field identical by the blood relationship of field, field blood relationship path
Section.Such as, system determines that aiming field a in the first tables of data A need to be monitored, and system obtains aiming field a field
Blood relationship, if the field blood relationship of certain field is as the field blood relationship of aiming field, it is determined that being somebody's turn to do " certain field " is
With aiming field identical matching field.
Step S304:Obtain monitoring rules.
During step S302 and step S303 is performed, above-mentioned monitoring can be taken out based on the blood relationship of field
Regular (i.e. predetermined comparison rules), such as:Processing logic+online data=offline field name identical between field, table
Predetermined comparison rules.
Step S305:Judge whether identical field violates monitoring rules.
If identical field (i.e. above-mentioned the first field and the second field) violates monitoring rules, it is determined that go out this identical
Field (i.e. the first field and the second field) occur difference, then perform step S306;If identical field is not illegal
Monitoring rules, then continue to monitor.
Such as, in the case of field name identical between monitoring rules are table, broken the rules if field name is differed,
It is probably then that the blood relationship of original field is changed, so needing to reacquire blood relationship.
Step S306:Obtain the field blood relationship of the first field and the second field.
Alternatively, because field blood relationship changes, then field blood relationship can be recalculated.
Step S307:Path orientation problem is derived using field blood relationship.
By the comparison of front and rear blood relationship, and in blood relationship each step output result comparison, if in blood relationship some processing section
The information of point is inconsistent, and it is the node gone wrong that just automation, which positions the processing node,.
By above-described embodiment, according to the genetic connection between data (genetic connection can be recorded in machining path)
Mutually verified, such as the first layer source data (data such as extracted from online table, or online
Data in table) and the progress contrast rule configuration of end consumption data, such as regular to configure, give warning in advance problem, and
Reason of discrepancies is carried out by field blood relationship to be automatically positioned.
Based on above-described embodiment, present invention also provides a kind of determination mode of tables of data similarity.
Specifically, before the second field in the first field and the second tables of data in comparing the first tables of data, method
It can also include:The machining information of each field of each tables of data in tables of data to be analyzed is obtained, wherein, field
Machining information is at least used to record each processing logic in the machining path of corresponding field;Using in machining information
Processing logic, whether be identification information identical field, obtain judged result if judging each field;According to judged result
Count in tables of data to be analyzed and to possess the number of identification information identical field between tables of data two-by-two;Based on data two-by-two
The number of the identification information identical field possessed between table, calculates the similarity of tables of data two-by-two;Obtain and the first number
Meet multiple second tables of data of default similarity condition according to the similarity of table.
In an optional embodiment, two fields data granularity under similar circumstances, if the of two fields
The source field of one processing node is consistent, and the result field of last processing node is consistent, then two fields are mark
Know information identical field.
, can adding based on field in another optional embodiment in above-mentioned tables of data similarity determines method
Each processing logic on work path determines whether two fields are identification information identical field.
Specifically, whether using the processing logic in machining information, it is identification information identical field to judge each field,
Obtaining judged result can include:If each processing logic of two fields is consistent, it is mark to judge two fields
Know information identical field;If two fields have different processing logics, judge that two fields are believed for mark
The different field of breath, wherein, the information that judged result includes identification information identical field is different with identification information
Field information.
Need further exist for include in explanation, processing logic the filter condition in corresponding machining path, process
Function, derived data and result data.
Alternatively, if all information in the processing logic are consistent, the processing logic is consistent;If the processing logic
Middle derived data is consistent, but result data is inconsistent, then the processing logic is necessarily inconsistent.
In another optional scheme, field a source is q, and field b source is q, field a processing section
Point is 4, and field b processing node is 5, and field a and field b may also be identical field, such as before
3 processing nodes are consistent, but the result of the 4th of field a the processing node is m (namely field a property value),
And the result of field b the 4th processing node is n, but the result of the 5th of field b the processing node is m, then
Two fields are also identification information identical field.
, can be by it when the field blood relationship (i.e. machining information) for carrying out two fields is compared by above-described embodiment
In a positioning datum field, such as by benchmark field be set to another field in aiming field, two fields be than
To field, by the processing logic in each processing node in the aiming field and compare field processing define in adding
Work logic is compared.Such as, the common n processing node of aiming field, compares field for m processing node.The application
Source field in embodiment is derived data, and result field is result data.
It is alternatively possible to the field name of each field is first compared, and if the field name of two fields is different, two
Field is the different field of identification information;If the field name of two fields is the same, compare two fields first adds
The processing logic of work node, such as judges the derived data of first processing node 1 of two fields, if first processing
The derived data of node 1 is inconsistent, then two fields are different fields.
Further in order to ensure the accuracy of identification information identical field got, centre processing node is carried out
Blood relationship is verified.If the derived data of first processing node of two fields is consistent, can be by the processing of aiming field
Result data in node x result data and each processing logic for comparing field is compared, if comparing field
Y-th of processing result is consistent with the result data for processing node x, then is carried out to the processing node between (n-x)
During checking, verified using the processing logic of the processing node between (m-y).
The embodiment of data purification is described in detail with reference to Fig. 4, as shown in figure 4, the embodiment may include steps of:
Step S401:The machining information of each field of each tables of data in tables of data to be analyzed is obtained, wherein, field
Machining information be at least used to record each processing logic in the machining path of corresponding field.
Step S402:Whether using the processing logic in machining information, it is identification information identical word to judge each field
Section, obtains judged result.
Alternatively, two fields data granularity under similar circumstances, if first of two fields processing node
Source field is consistent, and the result field of last processing node is consistent, then two fields are identification information identical word
Section, is otherwise the different field of identification information.
Step S403:Identification information identical word between tables of data two-by-two is counted in tables of data to be analyzed according to judged result
The number of section.
Step S404:Based on the number of the identification information identical field between tables of data two-by-two, tables of data two-by-two is calculated
Similarity.
The number of the identification information identical field of two tables of data of pending analysis is obtained, the step can specifically lead to
Cross following steps realization:
The similarity P of tables of data two-by-two is calculated according to equation below, wherein, formula is:
P=Y*2/ (M+N), wherein, in this embodiment, Y is used for the mark letter for representing to possess between tables of data two-by-two
The number of identical field is ceased, M is used for the field number for representing a tables of data in tables of data two-by-two, and N is used to represent two
The field number of another tables of data in two tables of data.
The similarity of any two tables of data can be calculated by the above method, the processing method of the similarity can be answered
In scene for data recommendation and data purification.
After obtaining and meeting multiple second tables of data of default similarity condition with the similarity of the first tables of data, method
It can also include:Multiple second tables of data are sorted according to healthy attribute and qualitative attribute, Bit-reversed information is obtained,
Wherein, healthy attribute is used for the resource consumption value of characterize data table, and qualitative attribute is at least used for the information of characterize data table
The complete and degree of reliability.
Wherein, presetting similarity condition includes:Similarity is more than predetermined threshold value, by the data similar to the first tables of data
Table is according to the tables of data sorted after sequencing of similarity in top N.
Such as, after the similarity of tables of data two-by-two is determined by such scheme, it is more than with the similarity of the first tables of data
The tables of data of predetermined threshold value (such as 90%) obtains the healthy attribute of data of each the second tables of data as the second tables of data
(health of such as table point) and qualitative attribute (quality of such as table point), enters according to health point and quality point to the second tables of data
Row sequence (during sequence can using the weighted results of health point and quality point as table ranking score), obtain the second data
Sorted in the sequencing information of table, the sequencing information former tables of data be with the first tables of data degree of correlation it is higher,
And quality and health preferably tables of data.
The above-mentioned data processing method of the application can be applied in following scenes:
Before the machining information of each field of each tables of data in obtaining tables of data to be analyzed, receive for obtaining the
The push request of the similar table of one tables of data, based on push acquisition request tables of data to be analyzed, wherein, data to be analyzed
Table includes the first tables of data, namely applies in data-pushing scene;
The processing tasks for process data are received, the mark of the first tables of data is extracted from processing tasks, first is utilized
The mark of tables of data obtains tables of data to be analyzed, that is, can apply the processing mode in replacement data table task;
The clean-up task for clearing up the first tables of data is received, tables of data to be analyzed is obtained based on clean-up task, that is,
It can apply in data scrubbing.
Specifically, after Bit-reversed information is obtained, method can also include:Receiving the situation of push request
Under, Bit-reversed information is used as to the pushed information in response to pushing request;In the case where receiving processing tasks,
The first tables of data in processing tasks is replaced using first second tables of data in Bit-reversed information;Receiving cleaning
In the case of task, preceding q the second tables of data in Bit-reversed information are merged with the first tables of data, wherein, q
For natural number.
It is below application scenarios with data-pushing, the embodiment of the present application is described in detail with reference to Fig. 5.
As shown in figure 5, the embodiment may include steps of:
Step S501:Obtain the data table name pushed in request.
Step S502:The field blood relationship of each field in tables of data is obtained according to the data table name.
Step S503:The identification information identical field obtained in table is calculated according to the field blood relationship of each field.
The mode of the determination identification information identical field is consistent with the implementation in above-described embodiment, no longer goes to live in the household of one's in-laws on getting married herein
State.
Step S504:Number according to two tables of data identification information identical fields calculates the similarity of two tables of data.
Step S505:Pour in separately sequence according to similarity, health point and quality and recommended.
The processing mode of the step is consistent with the processing mode in above-described embodiment, will not be repeated here.
In the above-described embodiments, the similarity based on identification information identical field number computational chart, identification information is identical
Field number * 2/ (A literary name section number+B literary name sections number), when user carry out table search when, similarity is more than
The table of one scope, him is recommended by quality point and health point from high to low.
, can be by the high table of similarity by above-described embodiment, will be more excellent by health point and quality point search rank
Data recommendation is to consumer, by the selection of consumer, can gradually clear up the homogeneous data few with offline downstream application,
Accomplish data application intelligent optimization.
It is below application scenarios with data-pushing, the embodiment of the present application is described in detail with reference to Fig. 6.
Step S601:Obtain the data table name in processing tasks request.
Data table name in all embodiments of the application can be ID.
Step S602:The field blood relationship of each field in tables of data is obtained according to the data table name.
Step S603:The identification information identical field obtained between table is calculated according to the field blood relationship of each field.
The mode of the determination identification information identical field is consistent with the implementation in above-described embodiment, no longer goes to live in the household of one's in-laws on getting married herein
State.
Step S604:Number according to two tables of data identification information identical fields calculates the similarity of two tables of data.
Similarity can be more than to the substitution table that the table of the similarity of certain threshold value is used as tables of data in replacement task.
Step S605:Sequence is poured in separately according to health point and quality to be recommended.
Step S606:Whether all tasks are traveled through.
If so, then terminating, step S602 is performed if otherwise returning.
Can be by the high substitution table of similarity, health divides the tables of data that high, quality is divided in high tables of data replacement task.
Pass through above-described embodiment, it is possible to use the similarity between tables of data two-by-two, table and field that calculating task is quoted,
Whether there are quality point and the higher substitution table of health point, and guide user to use the table more optimized.
Present invention also provides a kind of scheme that lower grade table is accessed applied to periodic cleaning, its specific processing mode with
Above-mentioned processing mode is consistent, by the application scenarios, can discharge storage and computing resource, optimizes data framework, than
The high table of similarity such as is done into merging and compatibility, and (compatibility can be connected by table and realized, such as the first tables of data and second
The similarity of tables of data is 99%, more than predetermined threshold value 90%, if the health of the second tables of data point and quality point are above the
One tables of data, then can use the second tables of data to replace first tables of data, if the health of the second tables of data point and quality
Divide an evaluation of the evaluation point determined more than the first tables of data to divide, the second tables of data can also be used to replace first tables of data;
Certainly, in these cases, the first tables of data can also be replaced without using the second tables of data, but uses the second data
Table and the first tables of data carry out table connection, and connection result is replaced into the first tables of data and the second tables of data).
Specifically, the literary name section quoted in existing task is identical with other tables, it is possible to replaced with other tables,
According to table health point and quality point height, it is desirable to which user is replaced with more excellent table, can gradually clear up with it is offline under
Few homogeneous data is applied in trip, accomplishes data application intelligent optimization.
In the prior art, when carrying out homologous table and synchronously clearing up, only using one layer of genetic connection, i.e. data from taking out online
During offline, only judge whether the online table that off-line data is extracted is identical, with regard to that can obtain with identical source weight
The table extracted again, and retain one of them, remaining does offline processing, and in this operation, although same source data table
From same tables of data, but different table process may be different, so result in homologous table and substantially record
There are different information, if the source simply by judging tables of data, determine whether tables of data is identical, be not science
's.
And the application determines identification information identical field by the field blood relationship of field in tables of data, based on identification information
Identical field number determines the similarity of two tables of data, rather than the same simply by table of originating, and is judged as two
Individual table is identical table.Judgment mode used in this application, the process to the field between two tables of data compares
To analysis, can be gone out with discrimination even originate it is identical, nor recording the same source data table of same content.
Machining information is obtained in the embodiment of the present application to be included:Using the machining code of tables of data where corresponding field,
The source table of the processing node of each in the machining path of corresponding field is parsed, until source table is the extraction table of online table;
The processing logic of each processing node is recorded, wherein, processing logic includes:Source field and result field, processing
Also include filter condition and/or processing function in logic.
It should be noted that any one embodiment of the application can determine the field of a field through the above way
Blood relationship, namely determine the machining information of field.
The embodiment of the present application is described in detail with reference to Fig. 7, as shown in fig. 7, the embodiment may include steps of:
Step S701:Input literary name section.
In the embodiment of the present application, an operations described below need to be performed both by each field in table.
Step S702:The major key of table is determined based on literary name section.
If the tables of data is an order table, record has the information such as order number, purchaser in the order table, can pass through
The quantity of the tables of data entry, and the corresponding entry of each field quantity, determine the major key of the tables of data.Its
In, if the quantity of the entry of field is consistent with the quantity of tables of data entry, the field is the major key of tables of data.
100 orders are have recorded in order table described above, there are 100 order numbers, but there are 60 purchasers, then this is ordered
Order number field in single table is the major key of order table.
Step S703:Record the last layer source table of major key.
The machining code of tables of data can be obtained, the last layer of the tables of data is parsed from the machining code of the tables of data
Source table, similarly, can also read the source field of each field from the machining code of the tables of data.
Step S704:Record the filter condition of the processing node.
During being processed to tables of data, the filtering to tables of data is may relate to, base is read from machining code
In the filter condition of last layer source table, and obtain the corresponding contingency table of the filter condition of the processing node.
Table connection in the embodiment of the present application may each comprise the join filterings between direct Field Sanitization and table.
Step S705:Judge whether the contingency table of the processing node has filtered data.
If the contingency table of the processing node has filtered data, step S706 is performed;If processing the non-mistake of contingency table of node
Filter data, then it is extraction table to judge the contingency table, and is terminated.
The table for the data generation that extraction table in the embodiment of the present application is extracted from online table.
Step S706:Record the field in contingency table and table.
Step S707:Processing function on record field.
Step S708:Judge last layer source table and contingency table whether the extraction table all for online table.
If so, then completing blood relationship parsing;Step S703 is performed if it is not, then returning.
In above-described embodiment of the application, the field of the same alike result in previous embodiment refers to that identification information is identical
Field.The parsing of field blood relationship can specifically determine the major key of table, parse the master since a field of a table
The last layer source table (above-mentioned source field can be included) of key field, result field, filter condition (including it is direct
Join filterings between Field Sanitization and table), the function (the processing function i.e. in above-described embodiment) used.If upper one
Layer source table is not online table, or the contingency table in filter condition is not the extraction table of online table, then according to Fig. 7 institutes
The mode shown continues above to push away, and until the extraction table of all online tables of the table in upstream, filter condition, records each step
The path reviewed, generates machining information.
By above-described embodiment, data difference can be not only automatically positioned, the Intelligent purifying of data can also be carried out.It is logical
Cross the similarity data recommendation that system does very well high still to consumer, form the mechanism of the survival of the fittest, gradually system
The data application for showing difference is fewer and fewer, it is possible to complete offline, it is possible to reduce available data is stored, and optimizes data
Framework.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as to one it is
The combination of actions of row, but those skilled in the art should know, the application is not limited by described sequence of movement
System, because according to the application, some steps can be carried out sequentially or simultaneously using other.Secondly, art technology
Personnel should also know that embodiment described in this description belongs to preferred embodiment, involved action and module
Not necessarily necessary to the application.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but
The former is more preferably embodiment in many cases.Based on it is such understand, the technical scheme of the application substantially or
Say that the part contributed to prior art can be embodied in the form of software product, the computer software product is deposited
Storage is in a storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions are to cause a station terminal
Described in each embodiment of equipment (can be mobile phone, computer, server, or network equipment etc.) execution the application
Method.
Embodiment 2
According to the embodiment of the present application, a kind of place for being used to implement the tables of data of the processing method of above-mentioned tables of data is additionally provided
Device is managed, as shown in figure 8, the device includes:First comparing unit 81, information acquisition unit 83, second are relatively more single
Member 85 and difference positioning unit 87.
Wherein, the first comparing unit, for comparing second in the first field and the second tables of data in the first tables of data
Field;
Information acquisition unit, in the case of there is difference in the identification information for comparing the first field and the second field,
The machining information of the first field and the machining information of the second field are obtained, wherein, machining information is used to record corresponding field
Machining path in multiple processing logics;
Second comparing unit, for according to machining path, comparing each processing logic of each corresponding field;
Difference positioning unit, if inconsistent for the processing logic currently compared, it is determined that the processing logic currently compared
To there is the logic of difference.
Using above-described embodiment, the field of identification information identical first and the second word in tables of data to be analyzed is compared
In the case that section occurs in that difference, the processing logic of first field and the second field is compared automatically, if processing logic is not
Together, then the different logic is the problem of difference occurs in tables of data identification information identical field to be analyzed place.
By above-described embodiment, in two tables of data should identical field when there is difference, can be automatically according to right
Answer the processing logic of field to position the problem of difference occur place, improve processing accuracy rate.By the application, solve
The problem of efficiency is low when tables of data content is compared is carried out in the prior art, improves the treatment effeciency of tables of data comparison.
Above-mentioned identification information is the information for recognizing a field, and the identification information of a field points to a word
Section, such as field name, field processing logic.
Compare the second field in the first field and the second tables of data in the first tables of data, data to be analyzed can be passed through
Whether the identification information identical field existed originally in table meets predetermined comparison condition to realize always, if above-mentioned mark
Know information identical field and meet the predetermined comparison condition always, then can return to for performing and comparing in the first tables of data
The operation of the second field in one field and the second tables of data, if comparing out above-mentioned identification information identical field no longer
The predetermined comparison condition is met, difference occurs in identification information identical field that can be in the tables of data to be analyzed.Essence
On, the appearance of the corresponding field is due to further to have carried out new processing to the identification information identical field, then may be used
Identification information identical field is caused the processing logic of difference occur to position this by said apparatus.
Wherein, the predetermined comparison condition in above-described embodiment can be determined, the predetermined comparison condition based on scene is compared
In can include:Field name is identical, field processing logic is identical, field metadata is identical with processing logic.
In the above-described embodiments, record has the machining path of the corresponding field in the machining information of the corresponding field got
In processing logic, can include in the processing logic at least one following:The source field of corresponding processing node,
Result field, filter condition and processing function.
When comparing processing logic, above-mentioned source field, result field, the filter condition of processing node can be passed through
And processing function is compared, if the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared
For difference logic, so as to orient the position for difference occur.
In another optional embodiment, field determining unit, for obtaining the identification information of the first field (as added
Work information), determine there is the second field of identical identification information in the second tables of data with the first field.
In addition to specifying and comparing field, field blood relationship, field blood relationship that can be by the aiming field of the first tables of data
Path takes out the identification information identical matching field with the aiming field of the first tables of data, and the matching field can position
In other tables of data, in that case, the machining path of aiming field and matching field is just the same, by upper
The step of stating determination matching field pulls out predetermined comparison rules (i.e. above-mentioned predetermined comparison condition).
Alternatively, identification information it is identical including:Field name is identical, or, field metadata is identical with processing logic
(same metadata is such as processed into identification information identical field using same processing logic).
Identification information includes:Field name, wherein, the first comparing unit includes:First comparison module, for comparing
Whether the first field is identical with the field name of the second field;First difference determining module, if for the first field and
The field name of two fields is different, then difference occurs in the identification information for comparing the first field and the second field.
In an optional embodiment, identification information includes field metadata and processing logic, wherein, first compares
Unit includes:Second comparison module, field metadata and processing logic for comparing the first field and the second field are
It is no identical;Second difference determining module, if field metadata and processing logic for the first field and the second field are not
Together, then there is difference in the identification information for comparing the first field and the second field.
In an optional embodiment, device also includes:Field designating unit, for comparing in the first tables of data
Before the second field in first field and the second tables of data, specify information is obtained, wherein, specify information is used to specify
First field and the second field.
By above-described embodiment, the field for needing to carry out comparing can be directly specified.The program can apply into
In row Data Migration, the data before Data Migration and after Data Migration can be monitored, to verify that Data Migration is
It is no complete, verify Data Migration it is incomplete in the case of, carry out reason of discrepancies by field blood relationship and be automatically positioned.
According to above-described embodiment of the application, device can also be included shown in Fig. 9:Information acquisition unit 91, is used for
Before the second field in the first field and the second tables of data in comparing the first tables of data, tables of data to be analyzed is obtained
In each tables of data each field machining information, wherein, the machining information of field is at least used to record corresponding word
Each processing logic in the machining path of section;Judging unit 93, for using the processing logic in machining information, sentencing
Whether each field of breaking is identification information identical field, obtains judged result;Statistic unit 95, for according to judgement
As a result count in tables of data to be analyzed and to possess the number of identification information identical field between tables of data two-by-two;Computing unit
97, for based on number, calculating the similarity of tables of data two-by-two;Table acquiring unit 99, is obtained and the first tables of data
Similarity meets multiple second tables of data of default similarity condition.
By the comparison of front and rear blood relationship, and in blood relationship each step output result comparison, if in blood relationship some processing section
The information of point is inconsistent, and it is the node gone wrong that just automation, which positions the processing node,.
By above-described embodiment, mutually verified according to the genetic connection between data, such as first layer source data are (such as
The data extracted from online table, or the data in online table) and end consumption data progress contrast rule
Then configure, such as the rule configuration of identification information identical interfield, give warning in advance problem, and enters by field blood relationship
Row reason of discrepancies is automatically positioned.
Based on above-described embodiment, present invention also provides a kind of determining device of tables of data similarity.
Specifically, judging unit includes:First judge module, if each processing logic for two fields is consistent,
It is identification information identical field then to judge two fields;Second judge module, if having not for two fields
Same processing logic, then it is the different field of identification information to judge two fields, wherein, judged result includes
The information of the information of the identification information identical field field different with identification information.
In an optional embodiment, two fields data granularity under similar circumstances, if the of two fields
The source field of one processing node is consistent, and the result field of last processing node is consistent, then two fields are mark
Know information identical field.
Specifically, computing unit specifically for:
The similarity P of tables of data two-by-two is calculated according to equation below, wherein, formula is:
P=Y*2/ (M+N), wherein, Y is used to representing the identification information identical field possessed two-by-two between tables of data
Number, M is used for the field number for representing a tables of data in tables of data two-by-two, and N is used to represent another in tables of data two-by-two
The field number of individual tables of data.
According to above-described embodiment of the application, device can also include:Sequencing unit, for obtaining and the first data
The similarity of table meets after multiple second tables of data of default similarity condition, by multiple second tables of data according to health
Attribute and qualitative attribute sequence, obtain Bit-reversed information, wherein, the resource that healthy attribute is used for characterize data table disappears
Consumption value, information completely and the degree of reliability of the qualitative attribute at least for characterize data table.
Further, device also includes receiving unit, in tables of data to be analyzed is obtained each tables of data each
Receive at least one following before the machining information of field:The push for receiving the similar table for obtaining the first tables of data please
Ask, based on push acquisition request tables of data to be analyzed, wherein, tables of data to be analyzed includes the first tables of data;Receive
For the processing tasks of process data, the mark of the first tables of data is extracted from processing tasks, the first tables of data is utilized
Mark obtains tables of data to be analyzed;The clean-up task for clearing up the first tables of data is received, is obtained and treated based on clean-up task
Analytical data.
It should be further stated that, device also includes information output unit, for after Bit-reversed information is obtained,
One of in the following manner output information:In the case where receiving push request, Bit-reversed information is regard as response
In the pushed information for pushing request;In the case where receiving processing tasks, first in Bit-reversed information is used
Two tables of data replace the first tables of data in processing tasks;In the case where receiving clean-up task, Bit-reversed is believed
Q the second tables of data are merged with the first tables of data before in breath, wherein, q is natural number.
The similarity of any two tables of data can be calculated by the above method, the processing method of the similarity can be answered
In scene for data recommendation and data purification.
, can be by the high table of similarity by above-described embodiment, will be more excellent by health point and quality point search rank
Data recommendation is to consumer, by the selection of consumer, can gradually clear up the homogeneous data few with offline downstream application,
Accomplish data application intelligent optimization;The table that can also be quoted using the similarity between tables of data two-by-two, calculating task and
Field, if having quality point and the higher substitution table of health point, and guide user to use the table more optimized.
It should be noted that any one embodiment of the application can determine the field of a field through the above way
Blood relationship, namely determine the machining information of field.
Specifically, information acquisition unit includes:Parsing module, for the processing generation using tables of data where corresponding field
Code, parses the source table of the processing node of each in the machining path of corresponding field, until source table is the extraction of online table
Table;Logging modle, the processing logic for recording each processing node, wherein, processing logic includes:Carry out source word
Also include filter condition and/or processing function in section and result field, processing logic.
In above-described embodiment of the application, the parsing of field blood relationship specifically can be since a field of a table, really
Determine the major key of table, parse the last layer source table (above-mentioned source field can be included) of the major key field, result field,
Filter condition (including join filterings between direct Field Sanitization and table), the function used is (i.e. in above-described embodiment
Process function).If last layer source table is not online table, or the contingency table in filter condition is not taking out for online table
Table is taken, then continues above to push away in the way of shown in Fig. 7, until all online tables of the table in upstream, filter condition
Table is extracted, the path that each step is reviewed is recorded, machining information is generated.
It should be noted that example that the module or unit in the above embodiments of the present application are realized with corresponding step and
Application scenarios are identical, but are not limited to the disclosure of that of above-described embodiment one.It should be noted that said units conduct
A part for device may operate in the terminal of the offer of embodiment one, can be realized by software, can also be by hard
Part is realized.
It should be noted that affiliated those skilled in the art can be understood that, for convenience and simplicity of description,
The specific work process of the processing unit of the tables of data of foregoing description and description, may be referred in preceding method embodiment
Corresponding process, will not be repeated here.
Embodiment 3
Embodiments herein can provide a kind of terminal, the terminal can be terminal group in
Any one computer terminal.Alternatively, in the present embodiment, above computer terminal can also be replaced with
The terminal devices such as mobile terminal.
Alternatively, in the present embodiment, above computer terminal can be located in multiple network equipments of computer network
At least one network equipment.
In the present embodiment, above computer terminal can perform following steps in the processing method of tables of data:
Compare the second field in the first field and the second tables of data in the first tables of data;Compare the first field and
In the case that difference occurs in the identification information of second field, the machining information of the first field and the processing of the second field are obtained
Information, wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to processing road
Footpath, compares each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that current
The processing logic of comparison is the logic for difference occur.
Using above-described embodiment, the field of identification information identical first and the second word in tables of data to be analyzed is compared
In the case that section occurs in that difference, the processing logic of first field and the second field is compared automatically, if processing logic is not
Together, then the different logic is the problem of difference occurs in tables of data identification information identical field to be analyzed place.
By above-described embodiment, in two tables of data should identical field when there is difference, can be automatically according to right
Answer the processing logic of field to position the problem of difference occur place, improve processing accuracy rate.By the application, solve
The problem of efficiency is low when tables of data content is compared is carried out in the prior art, improves the treatment effeciency of tables of data comparison.
Alternatively, Figure 10 is a kind of network environment figure of terminal according to the embodiment of the present application.Such as Figure 10 institutes
Show, the terminal 101 can be with server 102 by network connection, and the terminal can include Fig. 1
Shown one or more (one is only shown in figure) processors and memory.
Wherein, the processing for the tables of data that memory can be used in storage software program and module, such as the embodiment of the present application
Corresponding programmed instruction/the module of method and apparatus, processor is stored in software program and mould in memory by operation
Block, so as to perform various function application and data processing, that is, realizes the processing method of above-mentioned tables of data.Memory
May include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage device,
Flash memory or other non-volatile solid state memories.In some instances, memory can further comprise relative to place
The remotely located memory of device is managed, these remote memories can pass through network connection to terminal A.The reality of above-mentioned network
Example includes but is not limited to internet, intranet, LAN, mobile radio communication and combinations thereof.
It will appreciated by the skilled person that the structure shown in Figure 10 is only signal, terminal can also be
Smart mobile phone (such as Android phone, iOS mobile phones), tablet personal computer, applause computer and mobile internet device
The terminal device such as (Mobile Internet Devices, MID), PAD.Figure 10 its not to above-mentioned electronic installation
Structure cause limit.For example, terminal 10 may also include the component more or less than shown in Figure 10
(such as network interface, display device), or with the configuration different from shown in Figure 10.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can be with
Completed by program come the device-dependent hardware of command terminal, the program can be stored in a computer-readable storage medium
In matter, storage medium can include:Flash disk, read-only storage (Read-Only Memory, ROM), deposit at random
Take device (Random Access Memory, RAM), disk or CD etc..
Embodiment 4
Embodiments herein additionally provides a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium
It can be used for preserving the program code performed by the processing method for the tables of data that above-described embodiment one is provided.
Alternatively, in the present embodiment, above-mentioned storage medium can be located in computer network Computer terminal group
In any one terminal, or in any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps:
Compare the second field in the first field and the second tables of data in the first tables of data;Comparing the first field and second
In the case that difference occurs in the identification information of field, the first field machining information and the machining information of the second field are obtained,
Wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to machining path, than
Compared with each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that currently compare
Processing logic is the logic for difference occur.
Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment
The part of detailed description, may refer to the associated description of other embodiment.
, can be by other in several embodiments provided herein, it should be understood that disclosed technology contents
Mode realize.Wherein, device embodiment described above is only schematical, such as division of described unit,
It is only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component
Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, institute
Display or the coupling each other discussed or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple NEs.Some or all of unit therein can be selected to realize the present embodiment according to the actual needs
The purpose of scheme.
In addition, each functional unit in the application each embodiment can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.It is above-mentioned integrated
Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit realized using in the form of SFU software functional unit and as independent production marketing or in use,
It can be stored in a computer read/write memory medium.Understood based on such, the technical scheme essence of the application
On all or part of the part that is contributed in other words to prior art or the technical scheme can be with software product
Form is embodied, and the computer software product is stored in a storage medium, including some instructions are to cause one
Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application
State all or part of step of method.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD
Etc. it is various can be with the medium of store program codes.
Described above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moistened
Decorations also should be regarded as the protection domain of the application.
Claims (15)
1. a kind of processing method of tables of data, it is characterised in that including:
Compare the second field in the first field and the second tables of data in the first tables of data;
In the case where difference occurs in the identification information for comparing first field and second field, obtain
The machining information of the machining information of first field and second field, wherein, the machining information is used for
Record multiple processing logics in the machining path of corresponding field;
According to the machining path, compare each processing logic of each corresponding field;
If the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared is described to occur
The logic of difference.
2. according to the method described in claim 1, it is characterised in that the first field in the first tables of data is compared and
Before the second field in two tables of data, methods described also includes:
The identification information of first field is obtained, determines in second tables of data to there is phase with first field
Second field of same identification information.
3. method according to claim 2, it is characterised in that the identification information includes field name, wherein,
The second field compared in the first field and the second tables of data in the first tables of data includes:
Compare first field whether identical with the field name of second field;
If first field is different with the field name of second field, compare first field and
There is difference in the identification information of second field.
4. method according to claim 2, it is characterised in that the identification information includes field metadata and processing
Logic, wherein, the second field compared in the first field and the second tables of data in the first tables of data includes:
Compare first field whether identical with the field metadata and processing logic of second field;
If first field is different with processing logic with the field metadata of second field, institute is compared
There is difference in the identification information for stating the first field and second field.
5. according to the method described in claim 1, it is characterised in that the first field in the first tables of data is compared and
Before the second field in two tables of data, methods described also includes:
Obtain the machining information of each field of each tables of data in tables of data to be analyzed;
Using the processing logic in the machining information, judge whether each described field is identification information identical
Field, obtains judged result;
Possess identification information phase between counting in the tables of data to be analyzed tables of data two-by-two according to the judged result
The number of same field;
Based on the number calculate described in tables of data two-by-two similarity;
The similarity obtained with first tables of data meets multiple second tables of data of default similarity condition.
6. method according to claim 5, it is characterised in that using the processing logic in the machining information, sentence
Whether each disconnected described field is that identification information identical field includes:
If each processing logic of two fields is consistent, judge that two fields are identical for the identification information
Field;
If two fields have different processing logics, judge that two fields are different for the identification information
Field.
7. method according to claim 5, it is characterised in that obtaining the similarity symbol with first tables of data
After multiple second tables of data for closing default similarity condition, methods described also includes:
The multiple second tables of data is sorted according to healthy attribute and qualitative attribute, Bit-reversed information is obtained,
Wherein, the healthy attribute is used for the resource consumption value of characterize data table, and the qualitative attribute is at least used for
The information completely and the degree of reliability of characterize data table.
8. method according to claim 7, it is characterised in that each tables of data in tables of data to be analyzed is obtained
Before the machining information of each field, methods described also includes at least one following:
The push request of the similar table for obtaining first tables of data is received, based on the push acquisition request
The tables of data to be analyzed, wherein, the tables of data to be analyzed includes first tables of data;
The processing tasks for process data are received, the mark of first tables of data is extracted from the processing tasks
Know, the tables of data to be analyzed is obtained using the mark of first tables of data;
Receive the clean-up task for clearing up first tables of data, based on the clean-up task obtain described in treat point
Analyse tables of data.
9. method according to claim 8, it is characterised in that after Bit-reversed information is obtained, methods described
Also include:
In the case where receiving the push request, the Bit-reversed information is pushed as in response to described
The pushed information of request;
In the case where receiving the processing tasks, first second data in the Bit-reversed information are used
Table replaces the first tables of data in the processing tasks;
In the case where receiving the clean-up task, by preceding q the second tables of data in the Bit-reversed information
Merged with first tables of data, wherein, q is natural number.
10. method as claimed in any of claims 1 to 9, it is characterised in that obtain first field
Machining information and the machining information of second field include:
Using the machining code of tables of data where corresponding field, each in the machining path of the corresponding field is parsed
The source table of node is processed, until the source table is the extraction table of online table;
The processing logic of record each processing node, wherein, the processing logic includes:Source field
And result field, filter condition and/or processing function are also included in the processing logic.
11. a kind of processing unit of tables of data, it is characterised in that including:
First comparing unit, for comparing the second word in the first field and the second tables of data in the first tables of data
Section;
Information acquisition unit, for occurring in the identification information for comparing first field and second field
In the case of difference, the machining information of first field and the machining information of the second field are obtained, wherein, institute
Stating machining information is used to record multiple processing logics in the machining path of corresponding field;
Second comparing unit, for according to the machining path, comparing each processing of each corresponding field
Logic;
Difference positioning unit, if inconsistent for the processing logic currently compared, it is determined that described currently to compare
Processing logic is the logic for the difference occur.
12. device according to claim 11, it is characterised in that described device also includes:
Field determining unit, for obtain the first field identification information, determine in second tables of data with institute
Stating the first field has the second field of identical identification information.
13. device according to claim 12, it is characterised in that the identification information includes:Field name, wherein,
First comparing unit includes:
Whether the first comparison module is identical with the field name of second field for comparing first field;
First difference determining module, if different with the field name of second field for first field,
There is difference in the identification information for then comparing first field and second field.
14. device according to claim 12, it is characterised in that the identification information includes field metadata and processing
Logic, wherein, first comparing unit includes:
Second comparison module, for comparing field metadata and processing of first field with second field
Whether logic is identical;
Second difference determining module, if for first field and second field field metadata and plus
Work logic is different, then difference occurs in the identification information for comparing first field and second field.
15. device according to claim 11, it is characterised in that described device also includes:
Information acquisition unit, for second in the first field and the second tables of data in comparing the first tables of data
Before field, the machining information of each field of each tables of data in tables of data to be analyzed is obtained;
Judging unit, for using the processing logic in the machining information, judge each described field whether be
Identification information identical field, obtains judged result;
Statistic unit, for being counted according to the judged result in the tables of data to be analyzed between tables of data two-by-two
Possess the number of identification information identical field;
Computing unit, for the similarity of tables of data two-by-two described in based on the number, calculating;
Table acquiring unit, the similarity for obtaining with first tables of data meets many of default similarity condition
Individual second tables of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610197071.4A CN107291672B (en) | 2016-03-31 | 2016-03-31 | Data table processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610197071.4A CN107291672B (en) | 2016-03-31 | 2016-03-31 | Data table processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107291672A true CN107291672A (en) | 2017-10-24 |
CN107291672B CN107291672B (en) | 2020-11-20 |
Family
ID=60087795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610197071.4A Active CN107291672B (en) | 2016-03-31 | 2016-03-31 | Data table processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291672B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256113A (en) * | 2018-02-09 | 2018-07-06 | 口碑(上海)信息技术有限公司 | The method for digging and device of data genetic connection |
CN109240909A (en) * | 2018-08-03 | 2019-01-18 | 北京马上慧科技术有限公司 | A kind of data file verification method based on registration center |
CN109241068A (en) * | 2018-08-22 | 2019-01-18 | 中国平安人寿保险股份有限公司 | The method, apparatus and terminal device that foreground and background data compares |
CN109597802A (en) * | 2018-12-07 | 2019-04-09 | 江苏满运软件科技有限公司 | Database assertion data generation method, system, equipment and medium |
CN109783697A (en) * | 2018-12-14 | 2019-05-21 | 北京海数宝科技有限公司 | Data processing method, device, computer equipment and storage medium |
CN110210222A (en) * | 2018-10-24 | 2019-09-06 | 腾讯科技(深圳)有限公司 | Data processing method, data processing equipment and computer readable storage medium |
CN110889286A (en) * | 2019-10-12 | 2020-03-17 | 平安科技(深圳)有限公司 | Dependency relationship identification method and device based on data table and computer equipment |
CN111309795A (en) * | 2020-01-21 | 2020-06-19 | 北京百度网讯科技有限公司 | Service abnormity positioning method, device, electronic equipment and medium |
CN111723087A (en) * | 2019-03-19 | 2020-09-29 | 北京沃东天骏信息技术有限公司 | Mining method and device of data blood relationship, storage medium and electronic equipment |
CN112711591A (en) * | 2020-12-31 | 2021-04-27 | 天云融创数据科技(北京)有限公司 | Data blood margin determination method and device based on field level of knowledge graph |
CN112817984A (en) * | 2021-02-22 | 2021-05-18 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source obtaining method and device |
CN112988698A (en) * | 2019-12-02 | 2021-06-18 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN114547314A (en) * | 2022-04-25 | 2022-05-27 | 北京安华金和科技有限公司 | Data classification and classification method and system based on master-slave table |
CN114722075A (en) * | 2021-01-04 | 2022-07-08 | 中国移动通信集团山东有限公司 | Data stream processing method and device, server and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020059228A1 (en) * | 2000-07-31 | 2002-05-16 | Mccall Danny A. | Reciprocal data file publishing and matching system |
CN101079026A (en) * | 2007-07-02 | 2007-11-28 | 北京百问百答网络技术有限公司 | Text similarity, acceptation similarity calculating method and system and application system |
US20080086514A1 (en) * | 2006-10-04 | 2008-04-10 | Salesforce.Com, Inc. | Methods and systems for providing fault recovery to side effects occurring during data processing |
CN102411588A (en) * | 2010-09-26 | 2012-04-11 | 金蝶软件(中国)有限公司 | Comparison checking method and system of data table |
CN103324656A (en) * | 2012-03-22 | 2013-09-25 | 乐金信世股份有限公司 | Database management method and database management server thereof |
CN103473283A (en) * | 2013-08-29 | 2013-12-25 | 中国测绘科学研究院 | Method for matching textual cases |
CN103530334A (en) * | 2013-09-29 | 2014-01-22 | 方正国际软件有限公司 | System and method for data matching based on comparison module |
CN103678620A (en) * | 2013-12-18 | 2014-03-26 | 国家电网公司 | Knowledge document recommendation method based on user historical behavior features |
CN104063377A (en) * | 2013-03-18 | 2014-09-24 | 联想(北京)有限公司 | Information processing method and electronic equipment using same |
CN104239301A (en) * | 2013-06-06 | 2014-12-24 | 阿里巴巴集团控股有限公司 | Data comparing method and device |
-
2016
- 2016-03-31 CN CN201610197071.4A patent/CN107291672B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020059228A1 (en) * | 2000-07-31 | 2002-05-16 | Mccall Danny A. | Reciprocal data file publishing and matching system |
US20080086514A1 (en) * | 2006-10-04 | 2008-04-10 | Salesforce.Com, Inc. | Methods and systems for providing fault recovery to side effects occurring during data processing |
CN101079026A (en) * | 2007-07-02 | 2007-11-28 | 北京百问百答网络技术有限公司 | Text similarity, acceptation similarity calculating method and system and application system |
CN102411588A (en) * | 2010-09-26 | 2012-04-11 | 金蝶软件(中国)有限公司 | Comparison checking method and system of data table |
CN103324656A (en) * | 2012-03-22 | 2013-09-25 | 乐金信世股份有限公司 | Database management method and database management server thereof |
CN104063377A (en) * | 2013-03-18 | 2014-09-24 | 联想(北京)有限公司 | Information processing method and electronic equipment using same |
CN104239301A (en) * | 2013-06-06 | 2014-12-24 | 阿里巴巴集团控股有限公司 | Data comparing method and device |
CN103473283A (en) * | 2013-08-29 | 2013-12-25 | 中国测绘科学研究院 | Method for matching textual cases |
CN103530334A (en) * | 2013-09-29 | 2014-01-22 | 方正国际软件有限公司 | System and method for data matching based on comparison module |
CN103678620A (en) * | 2013-12-18 | 2014-03-26 | 国家电网公司 | Knowledge document recommendation method based on user historical behavior features |
Non-Patent Citations (4)
Title |
---|
BARNES TIFFANY 等: "Automatic hint generation for logic proof tutoring using historical data", 《JOURNAL OF EDUCATIONAL TECHNOLOGY & SOCIETY》 * |
BILENKO MIKHAIL 等: "Adaptive name matching in information integration", 《IEEE INTELLIGENT SYSTEMS》 * |
张子卿: "智慧商圈中个性化推荐系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
方方 等: "信息系统性能监测评估平台的研究与实现", 《微型电脑应用》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256113B (en) * | 2018-02-09 | 2020-06-16 | 口碑(上海)信息技术有限公司 | Data blood relationship mining method and device |
CN108256113A (en) * | 2018-02-09 | 2018-07-06 | 口碑(上海)信息技术有限公司 | The method for digging and device of data genetic connection |
CN109240909A (en) * | 2018-08-03 | 2019-01-18 | 北京马上慧科技术有限公司 | A kind of data file verification method based on registration center |
CN109241068A (en) * | 2018-08-22 | 2019-01-18 | 中国平安人寿保险股份有限公司 | The method, apparatus and terminal device that foreground and background data compares |
CN109241068B (en) * | 2018-08-22 | 2023-04-07 | 中国平安人寿保险股份有限公司 | Method and device for comparing foreground and background data and terminal equipment |
CN110210222B (en) * | 2018-10-24 | 2023-01-31 | 腾讯科技(深圳)有限公司 | Data processing method, data processing apparatus, and computer-readable storage medium |
CN110210222A (en) * | 2018-10-24 | 2019-09-06 | 腾讯科技(深圳)有限公司 | Data processing method, data processing equipment and computer readable storage medium |
CN109597802B (en) * | 2018-12-07 | 2020-12-01 | 江苏满运软件科技有限公司 | Database assertion data generation method, system, device, and medium |
CN109597802A (en) * | 2018-12-07 | 2019-04-09 | 江苏满运软件科技有限公司 | Database assertion data generation method, system, equipment and medium |
CN109783697B (en) * | 2018-12-14 | 2021-04-27 | 北京海数宝科技有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN109783697A (en) * | 2018-12-14 | 2019-05-21 | 北京海数宝科技有限公司 | Data processing method, device, computer equipment and storage medium |
CN111723087A (en) * | 2019-03-19 | 2020-09-29 | 北京沃东天骏信息技术有限公司 | Mining method and device of data blood relationship, storage medium and electronic equipment |
CN111723087B (en) * | 2019-03-19 | 2023-11-10 | 北京沃东天骏信息技术有限公司 | Data blood relationship mining method and device, storage medium and electronic equipment |
CN110889286B (en) * | 2019-10-12 | 2022-04-12 | 平安科技(深圳)有限公司 | Dependency relationship identification method and device based on data table and computer equipment |
CN110889286A (en) * | 2019-10-12 | 2020-03-17 | 平安科技(深圳)有限公司 | Dependency relationship identification method and device based on data table and computer equipment |
CN112988698A (en) * | 2019-12-02 | 2021-06-18 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN111309795A (en) * | 2020-01-21 | 2020-06-19 | 北京百度网讯科技有限公司 | Service abnormity positioning method, device, electronic equipment and medium |
CN112711591A (en) * | 2020-12-31 | 2021-04-27 | 天云融创数据科技(北京)有限公司 | Data blood margin determination method and device based on field level of knowledge graph |
CN114722075A (en) * | 2021-01-04 | 2022-07-08 | 中国移动通信集团山东有限公司 | Data stream processing method and device, server and storage medium |
CN112817984A (en) * | 2021-02-22 | 2021-05-18 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source obtaining method and device |
CN112817984B (en) * | 2021-02-22 | 2023-10-20 | 杭州数梦工场科技有限公司 | Data processing method and device, and data source acquisition method and device |
CN114547314A (en) * | 2022-04-25 | 2022-05-27 | 北京安华金和科技有限公司 | Data classification and classification method and system based on master-slave table |
CN114547314B (en) * | 2022-04-25 | 2022-07-05 | 北京安华金和科技有限公司 | Data classification and classification method and system based on master-slave table |
Also Published As
Publication number | Publication date |
---|---|
CN107291672B (en) | 2020-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107291672A (en) | The treating method and apparatus of tables of data | |
CN108108821A (en) | Model training method and device | |
CN104717124B (en) | A kind of friend recommendation method, apparatus and server | |
CN110222880B (en) | Service risk determining method, model training method and data processing method | |
CN111966904B (en) | Information recommendation method and related device based on multi-user portrait model | |
CN108197532A (en) | The method, apparatus and computer installation of recognition of face | |
CN108090208A (en) | Fused data processing method and processing device | |
CN108959516B (en) | Conversation message treating method and apparatus | |
CN108764375B (en) | Highway goods stock transprovincially matching process and device | |
CN108898476A (en) | A kind of loan customer credit-graded approach and device | |
CN110310114A (en) | Object classification method, device, server and storage medium | |
CN111797320B (en) | Data processing method, device, equipment and storage medium | |
CN108345601A (en) | Search result ordering method and device | |
CN110874744A (en) | Data anomaly detection method and device | |
CN109190646A (en) | A kind of data predication method neural network based, device and nerve network system | |
CN110019519A (en) | Data processing method, device, storage medium and electronic device | |
CN112839014A (en) | Method, system, device and medium for establishing model for identifying abnormal visitor | |
CN105323763B (en) | A kind of recognition methods of junk short message and device | |
CN107451249B (en) | Event development trend prediction method and device | |
CN107767155B (en) | Method and system for evaluating user portrait data | |
CN106326263B (en) | The method and apparatus for obtaining the matching relationship between data | |
CN112925899B (en) | Ordering model establishment method, case clue recommendation method, device and medium | |
CN108335008A (en) | Web information processing method and device, storage medium and electronic device | |
CN106227661A (en) | Data processing method and device | |
CN112508654A (en) | Product information recommendation method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |