CN107291672A - The treating method and apparatus of tables of data - Google Patents

The treating method and apparatus of tables of data Download PDF

Info

Publication number
CN107291672A
CN107291672A CN201610197071.4A CN201610197071A CN107291672A CN 107291672 A CN107291672 A CN 107291672A CN 201610197071 A CN201610197071 A CN 201610197071A CN 107291672 A CN107291672 A CN 107291672A
Authority
CN
China
Prior art keywords
field
data
tables
processing
identification information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610197071.4A
Other languages
Chinese (zh)
Other versions
CN107291672B (en
Inventor
纪丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610197071.4A priority Critical patent/CN107291672B/en
Publication of CN107291672A publication Critical patent/CN107291672A/en
Application granted granted Critical
Publication of CN107291672B publication Critical patent/CN107291672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application discloses a kind for the treatment of method and apparatus of tables of data.Wherein, this method includes:Compare the second field in the first field and the second tables of data in the first tables of data;In the case where difference occurs in the identification information for comparing the first field and the second field, the machining information of the first field and the machining information of the second field are obtained, wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to machining path, compare each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared is the logic for difference occur.Present application addresses carry out efficiency low technical problem when tables of data content is compared.

Description

The treating method and apparatus of tables of data
Technical field
The application is related to data processing field, in particular to a kind for the treatment of method and apparatus of tables of data.
Background technology
In the prior art, it is the comparison for directly doing data content in comparison data table, after finding that content is variant, To be looked into upwards by manually along processing link, carry out Wrong localization.
After content deltas is found, it is necessary to which artificial obtain the data for occurring difference, and edge after manually finding differences The processing link for the data for occurring difference, one saves and looks into upwards, the comparison of link one by one, with Wrong localization. During this, due to substantial amounts of comparison task, burdensome, and manually-operated process are manually operated, is malfunctioned Rate is higher.
The problem of efficiency is low when being compared for above-mentioned carry out tables of data content, not yet proposes effective solution at present.
The content of the invention
The embodiment of the present application provides a kind for the treatment of method and apparatus of tables of data, at least to solve to carry out tables of data content The problem of efficiency is low during comparison.
According to the one side of the embodiment of the present application there is provided a kind of processing method of tables of data, this method includes:Than Compared with the second field in the first field and the second tables of data in the first tables of data;Comparing the first field and the second word In the case that difference occurs in the identification information of section, the machining information of the first field and the machining information of the second field are obtained, Wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to machining path, than Compared with each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that currently compare Processing logic is the logic for difference occur.
According to the another aspect of the embodiment of the present application, a kind of processing unit of tables of data is additionally provided, the device includes: First comparing unit, for comparing the second field in the first field and the second tables of data in the first tables of data;Information Acquiring unit, in the case of there is difference in the identification information for comparing the first field and the second field, obtains the The machining information of the machining information of one field and the second field, wherein, machining information is used for the processing for recording corresponding field Multiple processing logics in path;Second comparing unit, for according to machining path, comparing each of each corresponding field Individual processing logic;Difference positioning unit, if inconsistent for the processing logic currently compared, it is determined that currently compare Processing logic is the logic for difference occur.
Using above-described embodiment, the first tables of data and second tables of data identification information the first field of identical are being compared In the case of occurring in that difference with the second field, the processing logic of first field and the second field is compared automatically, if plus Work logic is different, then the different logic is that tables of data identification information identical field to be analyzed asking for difference occurs Where topic.By above-described embodiment, in two tables of data should identical field when there is difference, can from The dynamic processing logic according to corresponding field is positioned where the problem of difference occur, improves processing accuracy rate.Pass through this Shen Please, solve and carry out the problem of efficiency is low when tables of data content is compared in the prior art, improve the place of tables of data comparison Manage efficiency.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used to explain the application, does not constitute the improper restriction to the application.In accompanying drawing In:
Fig. 1 is a kind of hardware block diagram of the terminal of the processing method of tables of data of the embodiment of the present application;
Fig. 2 is the flow chart of the processing method of the tables of data according to the embodiment of the present application;
Fig. 3 is a kind of flow chart one of the processing method of optional tables of data according to the embodiment of the present application;
Fig. 4 is a kind of flowchart 2 of the processing method of optional tables of data according to the embodiment of the present application;
Fig. 5 is the flow chart of the processing method of the tables of data applied to scene one according to the embodiment of the present application;
Fig. 6 is the flow chart of the processing method of the tables of data applied to scene two according to the embodiment of the present application;
Fig. 7 is a kind of flow chart of optional machining information for obtaining data sheet field according to the embodiment of the present application;
Fig. 8 is a kind of schematic diagram of the processing unit of optional tables of data according to the embodiment of the present application;
Fig. 9 is the schematic diagram of the processing unit of another optional tables of data according to the embodiment of the present application;
Figure 10 is a kind of network environment figure of terminal according to the embodiment of the present application.
Embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment The only embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ability The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, should all belong to The scope of the application protection.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that this The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except Here the order beyond those for illustrating or describing is implemented.In addition, term " comprising " and " having " and they Any deformation, it is intended that covering is non-exclusive to be included, for example, containing process, the side of series of steps or unit Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear It is that ground is listed or for the intrinsic other steps of these processes, method, product or equipment or unit.
First, to the invention relates to term be explained as follows, but these data explain not to the application implement Example causes to limit:
Online table:The table that operation system is produced, the data in online table are due to the operation or torsion of business and write number According to storehouse.
The blood relationship of data:The data that will can be drawn into from online table, new data are formed by calculation or processing, The link calculated between online data and new data i.e. referred to as blood relationship.
Machining path:Name for recording each processing node (i.e. procedure of processing) during being processed to data The processing logic of link and each processing node between title, the order of processing node, processing node.Specific to this Shen Please be in embodiment, processing refers to:Calculation or the operation of logical process are performed to the data being drawn into online table.
Processing logic:Source field, result field, filter condition and processing function for recording processing node (should It can be logical process function to process function).
Data bore:The practical business implication of data representation.
The similarity of table:It is main to judge from attribute informations such as the source of data, the process of data, the granularities of data Whether the field of two tables is identical, then passes through the similarity of field identical number computational chart.
The quality of table point:For weighing a table quality of data quality, principal measure information completely and the degree of reliability.
The health of table point:The health degree that table is used is weighed by the consumption of the storage of table, computing resource.
The access temperature of table:For describing the number of times that table is used within a period of time, number of times is more, and temperature is higher.
Embodiment 1
According to the embodiment of the present application, a kind of embodiment of the processing method of tables of data is additionally provided, it is necessary to illustrate, It can be performed the step of the flow of accompanying drawing is illustrated in the computer system of such as one group computer executable instructions, And, although logical order is shown in flow charts, but in some cases, can be with suitable different from herein Sequence performs shown or described step.
The embodiment of the method that the embodiment of the present application one is provided can be in mobile terminal, terminal or similar fortune Calculate in device and perform.Exemplified by running on computer terminals, Fig. 1 is a kind of place of tables of data of the embodiment of the present application The hardware block diagram of the terminal of reason method.As shown in figure 1, terminal 10 can include one or many (processor 102 can include but is not limited to Micro-processor MCV or can individual (one is only shown in figure) processor 102 Programmed logic device FPGA etc. processing unit), the memory 104 for data storage and for communication function Transport module 106.It will appreciated by the skilled person that the structure shown in Fig. 1 is only signal, it is not Structure to above-mentioned electronic installation causes to limit.For example, terminal 10 may also include it is more more than shown in Fig. 1 or The less component of person, or with the configuration different from shown in Fig. 1.
The data that memory 104 can be used in the software program and module of storage application software, such as the embodiment of the present application Corresponding programmed instruction/the module of processing method of table, processor 102 is stored in the software in memory 104 by operation Program and module, so as to perform various function application and data processing, that is, realize the processing side of above-mentioned tables of data Method.Memory 104 may include high speed random access memory, may also include nonvolatile memory, such as one or more Magnetic storage device, flash memory or other non-volatile solid state memories.In some instances, memory 104 can Further comprise the memory remotely located relative to processor 102, these remote memories can pass through network connection To terminal 10.The example of above-mentioned network includes but is not limited to internet, intranet, LAN, movement Communication network and combinations thereof.
Transmitting device 106 is used to data are received or sent via a network.Above-mentioned network instantiation may include The wireless network that the communication providerses of terminal 10 are provided.In an example, transmitting device 106 includes one Network adapter (Network Interface Controller, NIC), it can pass through base station and other network equipments It is connected to be communicated with internet.In an example, transmitting device 106 can be radio frequency (Radio Frequency, RF) module, it is used to wirelessly be communicated with internet.
Under above-mentioned running environment, this application provides the processing method of tables of data as shown in Figure 2.Fig. 2 is basis The flow chart of the processing method of the tables of data of the embodiment of the present application.
As shown in Fig. 2 this method may include steps of:
Step S202:Compare the second field in the first field and the second tables of data in the first tables of data;
Step S204:In the case where difference occurs in the identification information for comparing the first field and the second field, the is obtained The machining information of the machining information of one field and the second field, wherein, machining information is used for the processing for recording corresponding field Multiple processing logics in path, the corresponding field is the first field or the second field;
Step S206:According to machining path, compare each processing logic of each corresponding field;
Step S208:If the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared is poor to occur Different logic.
Using above-described embodiment, the field of identification information identical first and the second word in tables of data to be analyzed is compared In the case that section occurs in that difference, the processing logic of first field and the second field is compared automatically, if processing logic is not Together, then the different logic is the problem of difference occurs in tables of data identification information identical field to be analyzed place. By above-described embodiment, in two tables of data should identical field when there is difference, can be automatically according to right Answer the processing logic of field to position the problem of difference occur place, improve processing accuracy rate.By the application, solve The problem of efficiency is low when tables of data content is compared is carried out in the prior art, improves the treatment effeciency of tables of data comparison.
Above-mentioned identification information is the information for recognizing a field, and the identification information of a field points to a field, Such as field name, field processing logic.
Above-mentioned steps S202, compares the second field in the first field and the second tables of data in the first tables of data, can be with Whether predetermined comparison condition is met come real by the identification information identical field existed originally in tables of data to be analyzed always It is existing, if above-mentioned identification information identical field meets the predetermined comparison condition always, it can return and perform the step S202, can this be to be analyzed if comparing out above-mentioned identification information identical field no longer meets the predetermined comparison condition There is difference in identification information identical field in tables of data.Substantially, the appearance of the corresponding field is due to the mark Know information identical field and further carried out new processing, then can be somebody's turn to do by step S204 to step S208 positioning Identification information identical field is caused the processing logic of difference occur.
Wherein, the predetermined comparison condition in above-described embodiment can be determined, the predetermined comparison condition based on scene is compared In can include:Field name is identical, field processing logic is identical, field metadata is identical with processing logic.
In the above-described embodiments, record has the machining path of the corresponding field in the machining information got in step S204 In processing logic, can include in the processing logic at least one following:The source field of corresponding processing node, Result field, filter condition and processing function.
When comparing processing logic, above-mentioned source field, result field, the filter condition of processing node can be passed through And processing function is compared, if the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared For difference logic, so as to orient the position for difference occur.
In a kind of optional scheme, the second word in the first field and the second tables of data in comparing the first tables of data Before section, specify information is obtained, wherein, specify information is used to specify the first field and the second field.
That is, user can specify the comparison field of aiming field in tables of data to be analyzed, by aiming field and comparison word Section is defined as the first field and the second field in the first tables of data and the second tables of data.
By above-described embodiment, the field for needing to carry out comparing can be directly specified.The program can apply into In row Data Migration, the data before Data Migration and after Data Migration can be monitored, to verify that Data Migration is It is no complete, verify Data Migration it is incomplete in the case of, carry out reason of discrepancies by field blood relationship and be automatically positioned.
In another optional scheme, second in the first field and the second tables of data in comparing the first tables of data Before field, obtain the first field identification information, using the identification information of the first field determine in the second tables of data with First field has the second field of identical identification information.
Specifically, the identification information (such as machining information) of the aiming field of the first tables of data is obtained, it is determined that and aiming field The matching field that identical belongs to other tables of data is (identical with the identification information of aiming field i.e. in other tables of data The second field).
In addition to specifying and comparing field, field blood relationship, field blood relationship that can be by the aiming field of the first tables of data Path takes out the identical matching field with the aiming field of the first tables of data, and the matching field can be located at other numbers According in table, in that case, the machining path of aiming field and matching field is just the same, passes through above-mentioned determination The step of with field, pulls out predetermined comparison rules (i.e. above-mentioned predetermined comparison condition).
Alternatively, identification information it is identical including:Field name is identical, or, field metadata is identical with processing logic (same metadata is such as processed into identification information identical field using same processing logic).
In an optional embodiment, identification information includes field name, wherein, compare the in the first tables of data The second field in one field and the second tables of data includes:Compare the first field and the second field field name whether phase Together;If the first field is different with the field name of the second field, the mark letter of the first field and the second field is compared There is difference in breath.
In another optional embodiment, identification information includes field metadata and processing logic, wherein, compare the The second field in the first field and the second tables of data in one tables of data includes:Compare the first field and the second field Whether field metadata and processing logic are identical;If the field metadata and processing logic of the first field and the second field are not Together, then there is difference in the identification information for comparing the first field and the second field.
Below by taking two tables of data as an example, the alignments in the embodiment of the present application are described in detail with reference to Fig. 3, as shown in figure 3, The embodiment can be achieved by the steps of:
Step S301:Log-on data compares pattern.
Step S302:Whether detecting system specifies the comparison data of aiming field.
If detecting, system has specified the comparison data of aiming field, performs step S304;If detecting, system does not refer to The comparison data for the field that sets the goal, then perform step S303.
User can determine the comparison of aiming field a in the first tables of data A (the first field i.e. in above-described embodiment) Field b (i.e. comparison data), the comparison field belongs to the second tables of data B, is specifying the comparison data of aiming field In the case of, the aiming field and comparison field b are defined as the second field by system.
Step S303:Matching field is obtained according to the field blood relationship of aiming field.
Specifically, system can determine to match word with aiming field identical by the blood relationship of field, field blood relationship path Section.Such as, system determines that aiming field a in the first tables of data A need to be monitored, and system obtains aiming field a field Blood relationship, if the field blood relationship of certain field is as the field blood relationship of aiming field, it is determined that being somebody's turn to do " certain field " is With aiming field identical matching field.
Step S304:Obtain monitoring rules.
During step S302 and step S303 is performed, above-mentioned monitoring can be taken out based on the blood relationship of field Regular (i.e. predetermined comparison rules), such as:Processing logic+online data=offline field name identical between field, table Predetermined comparison rules.
Step S305:Judge whether identical field violates monitoring rules.
If identical field (i.e. above-mentioned the first field and the second field) violates monitoring rules, it is determined that go out this identical Field (i.e. the first field and the second field) occur difference, then perform step S306;If identical field is not illegal Monitoring rules, then continue to monitor.
Such as, in the case of field name identical between monitoring rules are table, broken the rules if field name is differed, It is probably then that the blood relationship of original field is changed, so needing to reacquire blood relationship.
Step S306:Obtain the field blood relationship of the first field and the second field.
Alternatively, because field blood relationship changes, then field blood relationship can be recalculated.
Step S307:Path orientation problem is derived using field blood relationship.
By the comparison of front and rear blood relationship, and in blood relationship each step output result comparison, if in blood relationship some processing section The information of point is inconsistent, and it is the node gone wrong that just automation, which positions the processing node,.
By above-described embodiment, according to the genetic connection between data (genetic connection can be recorded in machining path) Mutually verified, such as the first layer source data (data such as extracted from online table, or online Data in table) and the progress contrast rule configuration of end consumption data, such as regular to configure, give warning in advance problem, and Reason of discrepancies is carried out by field blood relationship to be automatically positioned.
Based on above-described embodiment, present invention also provides a kind of determination mode of tables of data similarity.
Specifically, before the second field in the first field and the second tables of data in comparing the first tables of data, method It can also include:The machining information of each field of each tables of data in tables of data to be analyzed is obtained, wherein, field Machining information is at least used to record each processing logic in the machining path of corresponding field;Using in machining information Processing logic, whether be identification information identical field, obtain judged result if judging each field;According to judged result Count in tables of data to be analyzed and to possess the number of identification information identical field between tables of data two-by-two;Based on data two-by-two The number of the identification information identical field possessed between table, calculates the similarity of tables of data two-by-two;Obtain and the first number Meet multiple second tables of data of default similarity condition according to the similarity of table.
In an optional embodiment, two fields data granularity under similar circumstances, if the of two fields The source field of one processing node is consistent, and the result field of last processing node is consistent, then two fields are mark Know information identical field.
, can adding based on field in another optional embodiment in above-mentioned tables of data similarity determines method Each processing logic on work path determines whether two fields are identification information identical field.
Specifically, whether using the processing logic in machining information, it is identification information identical field to judge each field, Obtaining judged result can include:If each processing logic of two fields is consistent, it is mark to judge two fields Know information identical field;If two fields have different processing logics, judge that two fields are believed for mark The different field of breath, wherein, the information that judged result includes identification information identical field is different with identification information Field information.
Need further exist for include in explanation, processing logic the filter condition in corresponding machining path, process Function, derived data and result data.
Alternatively, if all information in the processing logic are consistent, the processing logic is consistent;If the processing logic Middle derived data is consistent, but result data is inconsistent, then the processing logic is necessarily inconsistent.
In another optional scheme, field a source is q, and field b source is q, field a processing section Point is 4, and field b processing node is 5, and field a and field b may also be identical field, such as before 3 processing nodes are consistent, but the result of the 4th of field a the processing node is m (namely field a property value), And the result of field b the 4th processing node is n, but the result of the 5th of field b the processing node is m, then Two fields are also identification information identical field.
, can be by it when the field blood relationship (i.e. machining information) for carrying out two fields is compared by above-described embodiment In a positioning datum field, such as by benchmark field be set to another field in aiming field, two fields be than To field, by the processing logic in each processing node in the aiming field and compare field processing define in adding Work logic is compared.Such as, the common n processing node of aiming field, compares field for m processing node.The application Source field in embodiment is derived data, and result field is result data.
It is alternatively possible to the field name of each field is first compared, and if the field name of two fields is different, two Field is the different field of identification information;If the field name of two fields is the same, compare two fields first adds The processing logic of work node, such as judges the derived data of first processing node 1 of two fields, if first processing The derived data of node 1 is inconsistent, then two fields are different fields.
Further in order to ensure the accuracy of identification information identical field got, centre processing node is carried out Blood relationship is verified.If the derived data of first processing node of two fields is consistent, can be by the processing of aiming field Result data in node x result data and each processing logic for comparing field is compared, if comparing field Y-th of processing result is consistent with the result data for processing node x, then is carried out to the processing node between (n-x) During checking, verified using the processing logic of the processing node between (m-y).
The embodiment of data purification is described in detail with reference to Fig. 4, as shown in figure 4, the embodiment may include steps of:
Step S401:The machining information of each field of each tables of data in tables of data to be analyzed is obtained, wherein, field Machining information be at least used to record each processing logic in the machining path of corresponding field.
Step S402:Whether using the processing logic in machining information, it is identification information identical word to judge each field Section, obtains judged result.
Alternatively, two fields data granularity under similar circumstances, if first of two fields processing node Source field is consistent, and the result field of last processing node is consistent, then two fields are identification information identical word Section, is otherwise the different field of identification information.
Step S403:Identification information identical word between tables of data two-by-two is counted in tables of data to be analyzed according to judged result The number of section.
Step S404:Based on the number of the identification information identical field between tables of data two-by-two, tables of data two-by-two is calculated Similarity.
The number of the identification information identical field of two tables of data of pending analysis is obtained, the step can specifically lead to Cross following steps realization:
The similarity P of tables of data two-by-two is calculated according to equation below, wherein, formula is:
P=Y*2/ (M+N), wherein, in this embodiment, Y is used for the mark letter for representing to possess between tables of data two-by-two The number of identical field is ceased, M is used for the field number for representing a tables of data in tables of data two-by-two, and N is used to represent two The field number of another tables of data in two tables of data.
The similarity of any two tables of data can be calculated by the above method, the processing method of the similarity can be answered In scene for data recommendation and data purification.
After obtaining and meeting multiple second tables of data of default similarity condition with the similarity of the first tables of data, method It can also include:Multiple second tables of data are sorted according to healthy attribute and qualitative attribute, Bit-reversed information is obtained, Wherein, healthy attribute is used for the resource consumption value of characterize data table, and qualitative attribute is at least used for the information of characterize data table The complete and degree of reliability.
Wherein, presetting similarity condition includes:Similarity is more than predetermined threshold value, by the data similar to the first tables of data Table is according to the tables of data sorted after sequencing of similarity in top N.
Such as, after the similarity of tables of data two-by-two is determined by such scheme, it is more than with the similarity of the first tables of data The tables of data of predetermined threshold value (such as 90%) obtains the healthy attribute of data of each the second tables of data as the second tables of data (health of such as table point) and qualitative attribute (quality of such as table point), enters according to health point and quality point to the second tables of data Row sequence (during sequence can using the weighted results of health point and quality point as table ranking score), obtain the second data Sorted in the sequencing information of table, the sequencing information former tables of data be with the first tables of data degree of correlation it is higher, And quality and health preferably tables of data.
The above-mentioned data processing method of the application can be applied in following scenes:
Before the machining information of each field of each tables of data in obtaining tables of data to be analyzed, receive for obtaining the The push request of the similar table of one tables of data, based on push acquisition request tables of data to be analyzed, wherein, data to be analyzed Table includes the first tables of data, namely applies in data-pushing scene;
The processing tasks for process data are received, the mark of the first tables of data is extracted from processing tasks, first is utilized The mark of tables of data obtains tables of data to be analyzed, that is, can apply the processing mode in replacement data table task;
The clean-up task for clearing up the first tables of data is received, tables of data to be analyzed is obtained based on clean-up task, that is, It can apply in data scrubbing.
Specifically, after Bit-reversed information is obtained, method can also include:Receiving the situation of push request Under, Bit-reversed information is used as to the pushed information in response to pushing request;In the case where receiving processing tasks, The first tables of data in processing tasks is replaced using first second tables of data in Bit-reversed information;Receiving cleaning In the case of task, preceding q the second tables of data in Bit-reversed information are merged with the first tables of data, wherein, q For natural number.
It is below application scenarios with data-pushing, the embodiment of the present application is described in detail with reference to Fig. 5.
As shown in figure 5, the embodiment may include steps of:
Step S501:Obtain the data table name pushed in request.
Step S502:The field blood relationship of each field in tables of data is obtained according to the data table name.
Step S503:The identification information identical field obtained in table is calculated according to the field blood relationship of each field.
The mode of the determination identification information identical field is consistent with the implementation in above-described embodiment, no longer goes to live in the household of one's in-laws on getting married herein State.
Step S504:Number according to two tables of data identification information identical fields calculates the similarity of two tables of data.
Step S505:Pour in separately sequence according to similarity, health point and quality and recommended.
The processing mode of the step is consistent with the processing mode in above-described embodiment, will not be repeated here.
In the above-described embodiments, the similarity based on identification information identical field number computational chart, identification information is identical Field number * 2/ (A literary name section number+B literary name sections number), when user carry out table search when, similarity is more than The table of one scope, him is recommended by quality point and health point from high to low.
, can be by the high table of similarity by above-described embodiment, will be more excellent by health point and quality point search rank Data recommendation is to consumer, by the selection of consumer, can gradually clear up the homogeneous data few with offline downstream application, Accomplish data application intelligent optimization.
It is below application scenarios with data-pushing, the embodiment of the present application is described in detail with reference to Fig. 6.
Step S601:Obtain the data table name in processing tasks request.
Data table name in all embodiments of the application can be ID.
Step S602:The field blood relationship of each field in tables of data is obtained according to the data table name.
Step S603:The identification information identical field obtained between table is calculated according to the field blood relationship of each field.
The mode of the determination identification information identical field is consistent with the implementation in above-described embodiment, no longer goes to live in the household of one's in-laws on getting married herein State.
Step S604:Number according to two tables of data identification information identical fields calculates the similarity of two tables of data.
Similarity can be more than to the substitution table that the table of the similarity of certain threshold value is used as tables of data in replacement task.
Step S605:Sequence is poured in separately according to health point and quality to be recommended.
Step S606:Whether all tasks are traveled through.
If so, then terminating, step S602 is performed if otherwise returning.
Can be by the high substitution table of similarity, health divides the tables of data that high, quality is divided in high tables of data replacement task.
Pass through above-described embodiment, it is possible to use the similarity between tables of data two-by-two, table and field that calculating task is quoted, Whether there are quality point and the higher substitution table of health point, and guide user to use the table more optimized.
Present invention also provides a kind of scheme that lower grade table is accessed applied to periodic cleaning, its specific processing mode with Above-mentioned processing mode is consistent, by the application scenarios, can discharge storage and computing resource, optimizes data framework, than The high table of similarity such as is done into merging and compatibility, and (compatibility can be connected by table and realized, such as the first tables of data and second The similarity of tables of data is 99%, more than predetermined threshold value 90%, if the health of the second tables of data point and quality point are above the One tables of data, then can use the second tables of data to replace first tables of data, if the health of the second tables of data point and quality Divide an evaluation of the evaluation point determined more than the first tables of data to divide, the second tables of data can also be used to replace first tables of data; Certainly, in these cases, the first tables of data can also be replaced without using the second tables of data, but uses the second data Table and the first tables of data carry out table connection, and connection result is replaced into the first tables of data and the second tables of data).
Specifically, the literary name section quoted in existing task is identical with other tables, it is possible to replaced with other tables, According to table health point and quality point height, it is desirable to which user is replaced with more excellent table, can gradually clear up with it is offline under Few homogeneous data is applied in trip, accomplishes data application intelligent optimization.
In the prior art, when carrying out homologous table and synchronously clearing up, only using one layer of genetic connection, i.e. data from taking out online During offline, only judge whether the online table that off-line data is extracted is identical, with regard to that can obtain with identical source weight The table extracted again, and retain one of them, remaining does offline processing, and in this operation, although same source data table From same tables of data, but different table process may be different, so result in homologous table and substantially record There are different information, if the source simply by judging tables of data, determine whether tables of data is identical, be not science 's.
And the application determines identification information identical field by the field blood relationship of field in tables of data, based on identification information Identical field number determines the similarity of two tables of data, rather than the same simply by table of originating, and is judged as two Individual table is identical table.Judgment mode used in this application, the process to the field between two tables of data compares To analysis, can be gone out with discrimination even originate it is identical, nor recording the same source data table of same content.
Machining information is obtained in the embodiment of the present application to be included:Using the machining code of tables of data where corresponding field, The source table of the processing node of each in the machining path of corresponding field is parsed, until source table is the extraction table of online table; The processing logic of each processing node is recorded, wherein, processing logic includes:Source field and result field, processing Also include filter condition and/or processing function in logic.
It should be noted that any one embodiment of the application can determine the field of a field through the above way Blood relationship, namely determine the machining information of field.
The embodiment of the present application is described in detail with reference to Fig. 7, as shown in fig. 7, the embodiment may include steps of:
Step S701:Input literary name section.
In the embodiment of the present application, an operations described below need to be performed both by each field in table.
Step S702:The major key of table is determined based on literary name section.
If the tables of data is an order table, record has the information such as order number, purchaser in the order table, can pass through The quantity of the tables of data entry, and the corresponding entry of each field quantity, determine the major key of the tables of data.Its In, if the quantity of the entry of field is consistent with the quantity of tables of data entry, the field is the major key of tables of data. 100 orders are have recorded in order table described above, there are 100 order numbers, but there are 60 purchasers, then this is ordered Order number field in single table is the major key of order table.
Step S703:Record the last layer source table of major key.
The machining code of tables of data can be obtained, the last layer of the tables of data is parsed from the machining code of the tables of data Source table, similarly, can also read the source field of each field from the machining code of the tables of data.
Step S704:Record the filter condition of the processing node.
During being processed to tables of data, the filtering to tables of data is may relate to, base is read from machining code In the filter condition of last layer source table, and obtain the corresponding contingency table of the filter condition of the processing node.
Table connection in the embodiment of the present application may each comprise the join filterings between direct Field Sanitization and table.
Step S705:Judge whether the contingency table of the processing node has filtered data.
If the contingency table of the processing node has filtered data, step S706 is performed;If processing the non-mistake of contingency table of node Filter data, then it is extraction table to judge the contingency table, and is terminated.
The table for the data generation that extraction table in the embodiment of the present application is extracted from online table.
Step S706:Record the field in contingency table and table.
Step S707:Processing function on record field.
Step S708:Judge last layer source table and contingency table whether the extraction table all for online table.
If so, then completing blood relationship parsing;Step S703 is performed if it is not, then returning.
In above-described embodiment of the application, the field of the same alike result in previous embodiment refers to that identification information is identical Field.The parsing of field blood relationship can specifically determine the major key of table, parse the master since a field of a table The last layer source table (above-mentioned source field can be included) of key field, result field, filter condition (including it is direct Join filterings between Field Sanitization and table), the function (the processing function i.e. in above-described embodiment) used.If upper one Layer source table is not online table, or the contingency table in filter condition is not the extraction table of online table, then according to Fig. 7 institutes The mode shown continues above to push away, and until the extraction table of all online tables of the table in upstream, filter condition, records each step The path reviewed, generates machining information.
By above-described embodiment, data difference can be not only automatically positioned, the Intelligent purifying of data can also be carried out.It is logical Cross the similarity data recommendation that system does very well high still to consumer, form the mechanism of the survival of the fittest, gradually system The data application for showing difference is fewer and fewer, it is possible to complete offline, it is possible to reduce available data is stored, and optimizes data Framework.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as to one it is The combination of actions of row, but those skilled in the art should know, the application is not limited by described sequence of movement System, because according to the application, some steps can be carried out sequentially or simultaneously using other.Secondly, art technology Personnel should also know that embodiment described in this description belongs to preferred embodiment, involved action and module Not necessarily necessary to the application.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but The former is more preferably embodiment in many cases.Based on it is such understand, the technical scheme of the application substantially or Say that the part contributed to prior art can be embodied in the form of software product, the computer software product is deposited Storage is in a storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions are to cause a station terminal Described in each embodiment of equipment (can be mobile phone, computer, server, or network equipment etc.) execution the application Method.
Embodiment 2
According to the embodiment of the present application, a kind of place for being used to implement the tables of data of the processing method of above-mentioned tables of data is additionally provided Device is managed, as shown in figure 8, the device includes:First comparing unit 81, information acquisition unit 83, second are relatively more single Member 85 and difference positioning unit 87.
Wherein, the first comparing unit, for comparing second in the first field and the second tables of data in the first tables of data Field;
Information acquisition unit, in the case of there is difference in the identification information for comparing the first field and the second field, The machining information of the first field and the machining information of the second field are obtained, wherein, machining information is used to record corresponding field Machining path in multiple processing logics;
Second comparing unit, for according to machining path, comparing each processing logic of each corresponding field;
Difference positioning unit, if inconsistent for the processing logic currently compared, it is determined that the processing logic currently compared To there is the logic of difference.
Using above-described embodiment, the field of identification information identical first and the second word in tables of data to be analyzed is compared In the case that section occurs in that difference, the processing logic of first field and the second field is compared automatically, if processing logic is not Together, then the different logic is the problem of difference occurs in tables of data identification information identical field to be analyzed place. By above-described embodiment, in two tables of data should identical field when there is difference, can be automatically according to right Answer the processing logic of field to position the problem of difference occur place, improve processing accuracy rate.By the application, solve The problem of efficiency is low when tables of data content is compared is carried out in the prior art, improves the treatment effeciency of tables of data comparison.
Above-mentioned identification information is the information for recognizing a field, and the identification information of a field points to a word Section, such as field name, field processing logic.
Compare the second field in the first field and the second tables of data in the first tables of data, data to be analyzed can be passed through Whether the identification information identical field existed originally in table meets predetermined comparison condition to realize always, if above-mentioned mark Know information identical field and meet the predetermined comparison condition always, then can return to for performing and comparing in the first tables of data The operation of the second field in one field and the second tables of data, if comparing out above-mentioned identification information identical field no longer The predetermined comparison condition is met, difference occurs in identification information identical field that can be in the tables of data to be analyzed.Essence On, the appearance of the corresponding field is due to further to have carried out new processing to the identification information identical field, then may be used Identification information identical field is caused the processing logic of difference occur to position this by said apparatus.
Wherein, the predetermined comparison condition in above-described embodiment can be determined, the predetermined comparison condition based on scene is compared In can include:Field name is identical, field processing logic is identical, field metadata is identical with processing logic.
In the above-described embodiments, record has the machining path of the corresponding field in the machining information of the corresponding field got In processing logic, can include in the processing logic at least one following:The source field of corresponding processing node, Result field, filter condition and processing function.
When comparing processing logic, above-mentioned source field, result field, the filter condition of processing node can be passed through And processing function is compared, if the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared For difference logic, so as to orient the position for difference occur.
In another optional embodiment, field determining unit, for obtaining the identification information of the first field (as added Work information), determine there is the second field of identical identification information in the second tables of data with the first field.
In addition to specifying and comparing field, field blood relationship, field blood relationship that can be by the aiming field of the first tables of data Path takes out the identification information identical matching field with the aiming field of the first tables of data, and the matching field can position In other tables of data, in that case, the machining path of aiming field and matching field is just the same, by upper The step of stating determination matching field pulls out predetermined comparison rules (i.e. above-mentioned predetermined comparison condition).
Alternatively, identification information it is identical including:Field name is identical, or, field metadata is identical with processing logic (same metadata is such as processed into identification information identical field using same processing logic).
Identification information includes:Field name, wherein, the first comparing unit includes:First comparison module, for comparing Whether the first field is identical with the field name of the second field;First difference determining module, if for the first field and The field name of two fields is different, then difference occurs in the identification information for comparing the first field and the second field.
In an optional embodiment, identification information includes field metadata and processing logic, wherein, first compares Unit includes:Second comparison module, field metadata and processing logic for comparing the first field and the second field are It is no identical;Second difference determining module, if field metadata and processing logic for the first field and the second field are not Together, then there is difference in the identification information for comparing the first field and the second field.
In an optional embodiment, device also includes:Field designating unit, for comparing in the first tables of data Before the second field in first field and the second tables of data, specify information is obtained, wherein, specify information is used to specify First field and the second field.
By above-described embodiment, the field for needing to carry out comparing can be directly specified.The program can apply into In row Data Migration, the data before Data Migration and after Data Migration can be monitored, to verify that Data Migration is It is no complete, verify Data Migration it is incomplete in the case of, carry out reason of discrepancies by field blood relationship and be automatically positioned.
According to above-described embodiment of the application, device can also be included shown in Fig. 9:Information acquisition unit 91, is used for Before the second field in the first field and the second tables of data in comparing the first tables of data, tables of data to be analyzed is obtained In each tables of data each field machining information, wherein, the machining information of field is at least used to record corresponding word Each processing logic in the machining path of section;Judging unit 93, for using the processing logic in machining information, sentencing Whether each field of breaking is identification information identical field, obtains judged result;Statistic unit 95, for according to judgement As a result count in tables of data to be analyzed and to possess the number of identification information identical field between tables of data two-by-two;Computing unit 97, for based on number, calculating the similarity of tables of data two-by-two;Table acquiring unit 99, is obtained and the first tables of data Similarity meets multiple second tables of data of default similarity condition.
By the comparison of front and rear blood relationship, and in blood relationship each step output result comparison, if in blood relationship some processing section The information of point is inconsistent, and it is the node gone wrong that just automation, which positions the processing node,.
By above-described embodiment, mutually verified according to the genetic connection between data, such as first layer source data are (such as The data extracted from online table, or the data in online table) and end consumption data progress contrast rule Then configure, such as the rule configuration of identification information identical interfield, give warning in advance problem, and enters by field blood relationship Row reason of discrepancies is automatically positioned.
Based on above-described embodiment, present invention also provides a kind of determining device of tables of data similarity.
Specifically, judging unit includes:First judge module, if each processing logic for two fields is consistent, It is identification information identical field then to judge two fields;Second judge module, if having not for two fields Same processing logic, then it is the different field of identification information to judge two fields, wherein, judged result includes The information of the information of the identification information identical field field different with identification information.
In an optional embodiment, two fields data granularity under similar circumstances, if the of two fields The source field of one processing node is consistent, and the result field of last processing node is consistent, then two fields are mark Know information identical field.
Specifically, computing unit specifically for:
The similarity P of tables of data two-by-two is calculated according to equation below, wherein, formula is:
P=Y*2/ (M+N), wherein, Y is used to representing the identification information identical field possessed two-by-two between tables of data Number, M is used for the field number for representing a tables of data in tables of data two-by-two, and N is used to represent another in tables of data two-by-two The field number of individual tables of data.
According to above-described embodiment of the application, device can also include:Sequencing unit, for obtaining and the first data The similarity of table meets after multiple second tables of data of default similarity condition, by multiple second tables of data according to health Attribute and qualitative attribute sequence, obtain Bit-reversed information, wherein, the resource that healthy attribute is used for characterize data table disappears Consumption value, information completely and the degree of reliability of the qualitative attribute at least for characterize data table.
Further, device also includes receiving unit, in tables of data to be analyzed is obtained each tables of data each Receive at least one following before the machining information of field:The push for receiving the similar table for obtaining the first tables of data please Ask, based on push acquisition request tables of data to be analyzed, wherein, tables of data to be analyzed includes the first tables of data;Receive For the processing tasks of process data, the mark of the first tables of data is extracted from processing tasks, the first tables of data is utilized Mark obtains tables of data to be analyzed;The clean-up task for clearing up the first tables of data is received, is obtained and treated based on clean-up task Analytical data.
It should be further stated that, device also includes information output unit, for after Bit-reversed information is obtained, One of in the following manner output information:In the case where receiving push request, Bit-reversed information is regard as response In the pushed information for pushing request;In the case where receiving processing tasks, first in Bit-reversed information is used Two tables of data replace the first tables of data in processing tasks;In the case where receiving clean-up task, Bit-reversed is believed Q the second tables of data are merged with the first tables of data before in breath, wherein, q is natural number.
The similarity of any two tables of data can be calculated by the above method, the processing method of the similarity can be answered In scene for data recommendation and data purification.
, can be by the high table of similarity by above-described embodiment, will be more excellent by health point and quality point search rank Data recommendation is to consumer, by the selection of consumer, can gradually clear up the homogeneous data few with offline downstream application, Accomplish data application intelligent optimization;The table that can also be quoted using the similarity between tables of data two-by-two, calculating task and Field, if having quality point and the higher substitution table of health point, and guide user to use the table more optimized.
It should be noted that any one embodiment of the application can determine the field of a field through the above way Blood relationship, namely determine the machining information of field.
Specifically, information acquisition unit includes:Parsing module, for the processing generation using tables of data where corresponding field Code, parses the source table of the processing node of each in the machining path of corresponding field, until source table is the extraction of online table Table;Logging modle, the processing logic for recording each processing node, wherein, processing logic includes:Carry out source word Also include filter condition and/or processing function in section and result field, processing logic.
In above-described embodiment of the application, the parsing of field blood relationship specifically can be since a field of a table, really Determine the major key of table, parse the last layer source table (above-mentioned source field can be included) of the major key field, result field, Filter condition (including join filterings between direct Field Sanitization and table), the function used is (i.e. in above-described embodiment Process function).If last layer source table is not online table, or the contingency table in filter condition is not taking out for online table Table is taken, then continues above to push away in the way of shown in Fig. 7, until all online tables of the table in upstream, filter condition Table is extracted, the path that each step is reviewed is recorded, machining information is generated.
It should be noted that example that the module or unit in the above embodiments of the present application are realized with corresponding step and Application scenarios are identical, but are not limited to the disclosure of that of above-described embodiment one.It should be noted that said units conduct A part for device may operate in the terminal of the offer of embodiment one, can be realized by software, can also be by hard Part is realized.
It should be noted that affiliated those skilled in the art can be understood that, for convenience and simplicity of description, The specific work process of the processing unit of the tables of data of foregoing description and description, may be referred in preceding method embodiment Corresponding process, will not be repeated here.
Embodiment 3
Embodiments herein can provide a kind of terminal, the terminal can be terminal group in Any one computer terminal.Alternatively, in the present embodiment, above computer terminal can also be replaced with The terminal devices such as mobile terminal.
Alternatively, in the present embodiment, above computer terminal can be located in multiple network equipments of computer network At least one network equipment.
In the present embodiment, above computer terminal can perform following steps in the processing method of tables of data:
Compare the second field in the first field and the second tables of data in the first tables of data;Compare the first field and In the case that difference occurs in the identification information of second field, the machining information of the first field and the processing of the second field are obtained Information, wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to processing road Footpath, compares each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that current The processing logic of comparison is the logic for difference occur.
Using above-described embodiment, the field of identification information identical first and the second word in tables of data to be analyzed is compared In the case that section occurs in that difference, the processing logic of first field and the second field is compared automatically, if processing logic is not Together, then the different logic is the problem of difference occurs in tables of data identification information identical field to be analyzed place. By above-described embodiment, in two tables of data should identical field when there is difference, can be automatically according to right Answer the processing logic of field to position the problem of difference occur place, improve processing accuracy rate.By the application, solve The problem of efficiency is low when tables of data content is compared is carried out in the prior art, improves the treatment effeciency of tables of data comparison.
Alternatively, Figure 10 is a kind of network environment figure of terminal according to the embodiment of the present application.Such as Figure 10 institutes Show, the terminal 101 can be with server 102 by network connection, and the terminal can include Fig. 1 Shown one or more (one is only shown in figure) processors and memory.
Wherein, the processing for the tables of data that memory can be used in storage software program and module, such as the embodiment of the present application Corresponding programmed instruction/the module of method and apparatus, processor is stored in software program and mould in memory by operation Block, so as to perform various function application and data processing, that is, realizes the processing method of above-mentioned tables of data.Memory May include high speed random access memory, can also include nonvolatile memory, such as one or more magnetic storage device, Flash memory or other non-volatile solid state memories.In some instances, memory can further comprise relative to place The remotely located memory of device is managed, these remote memories can pass through network connection to terminal A.The reality of above-mentioned network Example includes but is not limited to internet, intranet, LAN, mobile radio communication and combinations thereof.
It will appreciated by the skilled person that the structure shown in Figure 10 is only signal, terminal can also be Smart mobile phone (such as Android phone, iOS mobile phones), tablet personal computer, applause computer and mobile internet device The terminal device such as (Mobile Internet Devices, MID), PAD.Figure 10 its not to above-mentioned electronic installation Structure cause limit.For example, terminal 10 may also include the component more or less than shown in Figure 10 (such as network interface, display device), or with the configuration different from shown in Figure 10.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can be with Completed by program come the device-dependent hardware of command terminal, the program can be stored in a computer-readable storage medium In matter, storage medium can include:Flash disk, read-only storage (Read-Only Memory, ROM), deposit at random Take device (Random Access Memory, RAM), disk or CD etc..
Embodiment 4
Embodiments herein additionally provides a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium It can be used for preserving the program code performed by the processing method for the tables of data that above-described embodiment one is provided.
Alternatively, in the present embodiment, above-mentioned storage medium can be located in computer network Computer terminal group In any one terminal, or in any one mobile terminal in mobile terminal group.
Alternatively, in the present embodiment, storage medium is arranged to the program code that storage is used to perform following steps: Compare the second field in the first field and the second tables of data in the first tables of data;Comparing the first field and second In the case that difference occurs in the identification information of field, the first field machining information and the machining information of the second field are obtained, Wherein, machining information is used to record multiple processing logics in the machining path of corresponding field;According to machining path, than Compared with each processing logic of each corresponding field;If the processing logic currently compared is inconsistent, it is determined that currently compare Processing logic is the logic for difference occur.
Above-mentioned the embodiment of the present application sequence number is for illustration only, and the quality of embodiment is not represented.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment The part of detailed description, may refer to the associated description of other embodiment.
, can be by other in several embodiments provided herein, it should be understood that disclosed technology contents Mode realize.Wherein, device embodiment described above is only schematical, such as division of described unit, It is only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, institute Display or the coupling each other discussed or direct-coupling or communication connection can be by some interfaces, unit or mould The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to On multiple NEs.Some or all of unit therein can be selected to realize the present embodiment according to the actual needs The purpose of scheme.
In addition, each functional unit in the application each embodiment can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.It is above-mentioned integrated Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit realized using in the form of SFU software functional unit and as independent production marketing or in use, It can be stored in a computer read/write memory medium.Understood based on such, the technical scheme essence of the application On all or part of the part that is contributed in other words to prior art or the technical scheme can be with software product Form is embodied, and the computer software product is stored in a storage medium, including some instructions are to cause one Platform computer equipment (can be personal computer, server or network equipment etc.) performs each embodiment institute of the application State all or part of step of method.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD Etc. it is various can be with the medium of store program codes.
Described above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art For member, on the premise of the application principle is not departed from, some improvements and modifications can also be made, these improve and moistened Decorations also should be regarded as the protection domain of the application.

Claims (15)

1. a kind of processing method of tables of data, it is characterised in that including:
Compare the second field in the first field and the second tables of data in the first tables of data;
In the case where difference occurs in the identification information for comparing first field and second field, obtain The machining information of the machining information of first field and second field, wherein, the machining information is used for Record multiple processing logics in the machining path of corresponding field;
According to the machining path, compare each processing logic of each corresponding field;
If the processing logic currently compared is inconsistent, it is determined that the processing logic currently compared is described to occur The logic of difference.
2. according to the method described in claim 1, it is characterised in that the first field in the first tables of data is compared and Before the second field in two tables of data, methods described also includes:
The identification information of first field is obtained, determines in second tables of data to there is phase with first field Second field of same identification information.
3. method according to claim 2, it is characterised in that the identification information includes field name, wherein, The second field compared in the first field and the second tables of data in the first tables of data includes:
Compare first field whether identical with the field name of second field;
If first field is different with the field name of second field, compare first field and There is difference in the identification information of second field.
4. method according to claim 2, it is characterised in that the identification information includes field metadata and processing Logic, wherein, the second field compared in the first field and the second tables of data in the first tables of data includes:
Compare first field whether identical with the field metadata and processing logic of second field;
If first field is different with processing logic with the field metadata of second field, institute is compared There is difference in the identification information for stating the first field and second field.
5. according to the method described in claim 1, it is characterised in that the first field in the first tables of data is compared and Before the second field in two tables of data, methods described also includes:
Obtain the machining information of each field of each tables of data in tables of data to be analyzed;
Using the processing logic in the machining information, judge whether each described field is identification information identical Field, obtains judged result;
Possess identification information phase between counting in the tables of data to be analyzed tables of data two-by-two according to the judged result The number of same field;
Based on the number calculate described in tables of data two-by-two similarity;
The similarity obtained with first tables of data meets multiple second tables of data of default similarity condition.
6. method according to claim 5, it is characterised in that using the processing logic in the machining information, sentence Whether each disconnected described field is that identification information identical field includes:
If each processing logic of two fields is consistent, judge that two fields are identical for the identification information Field;
If two fields have different processing logics, judge that two fields are different for the identification information Field.
7. method according to claim 5, it is characterised in that obtaining the similarity symbol with first tables of data After multiple second tables of data for closing default similarity condition, methods described also includes:
The multiple second tables of data is sorted according to healthy attribute and qualitative attribute, Bit-reversed information is obtained,
Wherein, the healthy attribute is used for the resource consumption value of characterize data table, and the qualitative attribute is at least used for The information completely and the degree of reliability of characterize data table.
8. method according to claim 7, it is characterised in that each tables of data in tables of data to be analyzed is obtained Before the machining information of each field, methods described also includes at least one following:
The push request of the similar table for obtaining first tables of data is received, based on the push acquisition request The tables of data to be analyzed, wherein, the tables of data to be analyzed includes first tables of data;
The processing tasks for process data are received, the mark of first tables of data is extracted from the processing tasks Know, the tables of data to be analyzed is obtained using the mark of first tables of data;
Receive the clean-up task for clearing up first tables of data, based on the clean-up task obtain described in treat point Analyse tables of data.
9. method according to claim 8, it is characterised in that after Bit-reversed information is obtained, methods described Also include:
In the case where receiving the push request, the Bit-reversed information is pushed as in response to described The pushed information of request;
In the case where receiving the processing tasks, first second data in the Bit-reversed information are used Table replaces the first tables of data in the processing tasks;
In the case where receiving the clean-up task, by preceding q the second tables of data in the Bit-reversed information Merged with first tables of data, wherein, q is natural number.
10. method as claimed in any of claims 1 to 9, it is characterised in that obtain first field Machining information and the machining information of second field include:
Using the machining code of tables of data where corresponding field, each in the machining path of the corresponding field is parsed The source table of node is processed, until the source table is the extraction table of online table;
The processing logic of record each processing node, wherein, the processing logic includes:Source field And result field, filter condition and/or processing function are also included in the processing logic.
11. a kind of processing unit of tables of data, it is characterised in that including:
First comparing unit, for comparing the second word in the first field and the second tables of data in the first tables of data Section;
Information acquisition unit, for occurring in the identification information for comparing first field and second field In the case of difference, the machining information of first field and the machining information of the second field are obtained, wherein, institute Stating machining information is used to record multiple processing logics in the machining path of corresponding field;
Second comparing unit, for according to the machining path, comparing each processing of each corresponding field Logic;
Difference positioning unit, if inconsistent for the processing logic currently compared, it is determined that described currently to compare Processing logic is the logic for the difference occur.
12. device according to claim 11, it is characterised in that described device also includes:
Field determining unit, for obtain the first field identification information, determine in second tables of data with institute Stating the first field has the second field of identical identification information.
13. device according to claim 12, it is characterised in that the identification information includes:Field name, wherein, First comparing unit includes:
Whether the first comparison module is identical with the field name of second field for comparing first field;
First difference determining module, if different with the field name of second field for first field, There is difference in the identification information for then comparing first field and second field.
14. device according to claim 12, it is characterised in that the identification information includes field metadata and processing Logic, wherein, first comparing unit includes:
Second comparison module, for comparing field metadata and processing of first field with second field Whether logic is identical;
Second difference determining module, if for first field and second field field metadata and plus Work logic is different, then difference occurs in the identification information for comparing first field and second field.
15. device according to claim 11, it is characterised in that described device also includes:
Information acquisition unit, for second in the first field and the second tables of data in comparing the first tables of data Before field, the machining information of each field of each tables of data in tables of data to be analyzed is obtained;
Judging unit, for using the processing logic in the machining information, judge each described field whether be Identification information identical field, obtains judged result;
Statistic unit, for being counted according to the judged result in the tables of data to be analyzed between tables of data two-by-two Possess the number of identification information identical field;
Computing unit, for the similarity of tables of data two-by-two described in based on the number, calculating;
Table acquiring unit, the similarity for obtaining with first tables of data meets many of default similarity condition Individual second tables of data.
CN201610197071.4A 2016-03-31 2016-03-31 Data table processing method and device Active CN107291672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610197071.4A CN107291672B (en) 2016-03-31 2016-03-31 Data table processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610197071.4A CN107291672B (en) 2016-03-31 2016-03-31 Data table processing method and device

Publications (2)

Publication Number Publication Date
CN107291672A true CN107291672A (en) 2017-10-24
CN107291672B CN107291672B (en) 2020-11-20

Family

ID=60087795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610197071.4A Active CN107291672B (en) 2016-03-31 2016-03-31 Data table processing method and device

Country Status (1)

Country Link
CN (1) CN107291672B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256113A (en) * 2018-02-09 2018-07-06 口碑(上海)信息技术有限公司 The method for digging and device of data genetic connection
CN109240909A (en) * 2018-08-03 2019-01-18 北京马上慧科技术有限公司 A kind of data file verification method based on registration center
CN109241068A (en) * 2018-08-22 2019-01-18 中国平安人寿保险股份有限公司 The method, apparatus and terminal device that foreground and background data compares
CN109597802A (en) * 2018-12-07 2019-04-09 江苏满运软件科技有限公司 Database assertion data generation method, system, equipment and medium
CN109783697A (en) * 2018-12-14 2019-05-21 北京海数宝科技有限公司 Data processing method, device, computer equipment and storage medium
CN110210222A (en) * 2018-10-24 2019-09-06 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and computer readable storage medium
CN110889286A (en) * 2019-10-12 2020-03-17 平安科技(深圳)有限公司 Dependency relationship identification method and device based on data table and computer equipment
CN111309795A (en) * 2020-01-21 2020-06-19 北京百度网讯科技有限公司 Service abnormity positioning method, device, electronic equipment and medium
CN111723087A (en) * 2019-03-19 2020-09-29 北京沃东天骏信息技术有限公司 Mining method and device of data blood relationship, storage medium and electronic equipment
CN112711591A (en) * 2020-12-31 2021-04-27 天云融创数据科技(北京)有限公司 Data blood margin determination method and device based on field level of knowledge graph
CN112817984A (en) * 2021-02-22 2021-05-18 杭州数梦工场科技有限公司 Data processing method and device, and data source obtaining method and device
CN112988698A (en) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 Data processing method and device
CN114547314A (en) * 2022-04-25 2022-05-27 北京安华金和科技有限公司 Data classification and classification method and system based on master-slave table
CN114722075A (en) * 2021-01-04 2022-07-08 中国移动通信集团山东有限公司 Data stream processing method and device, server and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020059228A1 (en) * 2000-07-31 2002-05-16 Mccall Danny A. Reciprocal data file publishing and matching system
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
US20080086514A1 (en) * 2006-10-04 2008-04-10 Salesforce.Com, Inc. Methods and systems for providing fault recovery to side effects occurring during data processing
CN102411588A (en) * 2010-09-26 2012-04-11 金蝶软件(中国)有限公司 Comparison checking method and system of data table
CN103324656A (en) * 2012-03-22 2013-09-25 乐金信世股份有限公司 Database management method and database management server thereof
CN103473283A (en) * 2013-08-29 2013-12-25 中国测绘科学研究院 Method for matching textual cases
CN103530334A (en) * 2013-09-29 2014-01-22 方正国际软件有限公司 System and method for data matching based on comparison module
CN103678620A (en) * 2013-12-18 2014-03-26 国家电网公司 Knowledge document recommendation method based on user historical behavior features
CN104063377A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and electronic equipment using same
CN104239301A (en) * 2013-06-06 2014-12-24 阿里巴巴集团控股有限公司 Data comparing method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020059228A1 (en) * 2000-07-31 2002-05-16 Mccall Danny A. Reciprocal data file publishing and matching system
US20080086514A1 (en) * 2006-10-04 2008-04-10 Salesforce.Com, Inc. Methods and systems for providing fault recovery to side effects occurring during data processing
CN101079026A (en) * 2007-07-02 2007-11-28 北京百问百答网络技术有限公司 Text similarity, acceptation similarity calculating method and system and application system
CN102411588A (en) * 2010-09-26 2012-04-11 金蝶软件(中国)有限公司 Comparison checking method and system of data table
CN103324656A (en) * 2012-03-22 2013-09-25 乐金信世股份有限公司 Database management method and database management server thereof
CN104063377A (en) * 2013-03-18 2014-09-24 联想(北京)有限公司 Information processing method and electronic equipment using same
CN104239301A (en) * 2013-06-06 2014-12-24 阿里巴巴集团控股有限公司 Data comparing method and device
CN103473283A (en) * 2013-08-29 2013-12-25 中国测绘科学研究院 Method for matching textual cases
CN103530334A (en) * 2013-09-29 2014-01-22 方正国际软件有限公司 System and method for data matching based on comparison module
CN103678620A (en) * 2013-12-18 2014-03-26 国家电网公司 Knowledge document recommendation method based on user historical behavior features

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BARNES TIFFANY 等: "Automatic hint generation for logic proof tutoring using historical data", 《JOURNAL OF EDUCATIONAL TECHNOLOGY & SOCIETY》 *
BILENKO MIKHAIL 等: "Adaptive name matching in information integration", 《IEEE INTELLIGENT SYSTEMS》 *
张子卿: "智慧商圈中个性化推荐系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
方方 等: "信息系统性能监测评估平台的研究与实现", 《微型电脑应用》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256113B (en) * 2018-02-09 2020-06-16 口碑(上海)信息技术有限公司 Data blood relationship mining method and device
CN108256113A (en) * 2018-02-09 2018-07-06 口碑(上海)信息技术有限公司 The method for digging and device of data genetic connection
CN109240909A (en) * 2018-08-03 2019-01-18 北京马上慧科技术有限公司 A kind of data file verification method based on registration center
CN109241068A (en) * 2018-08-22 2019-01-18 中国平安人寿保险股份有限公司 The method, apparatus and terminal device that foreground and background data compares
CN109241068B (en) * 2018-08-22 2023-04-07 中国平安人寿保险股份有限公司 Method and device for comparing foreground and background data and terminal equipment
CN110210222B (en) * 2018-10-24 2023-01-31 腾讯科技(深圳)有限公司 Data processing method, data processing apparatus, and computer-readable storage medium
CN110210222A (en) * 2018-10-24 2019-09-06 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and computer readable storage medium
CN109597802B (en) * 2018-12-07 2020-12-01 江苏满运软件科技有限公司 Database assertion data generation method, system, device, and medium
CN109597802A (en) * 2018-12-07 2019-04-09 江苏满运软件科技有限公司 Database assertion data generation method, system, equipment and medium
CN109783697B (en) * 2018-12-14 2021-04-27 北京海数宝科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN109783697A (en) * 2018-12-14 2019-05-21 北京海数宝科技有限公司 Data processing method, device, computer equipment and storage medium
CN111723087A (en) * 2019-03-19 2020-09-29 北京沃东天骏信息技术有限公司 Mining method and device of data blood relationship, storage medium and electronic equipment
CN111723087B (en) * 2019-03-19 2023-11-10 北京沃东天骏信息技术有限公司 Data blood relationship mining method and device, storage medium and electronic equipment
CN110889286B (en) * 2019-10-12 2022-04-12 平安科技(深圳)有限公司 Dependency relationship identification method and device based on data table and computer equipment
CN110889286A (en) * 2019-10-12 2020-03-17 平安科技(深圳)有限公司 Dependency relationship identification method and device based on data table and computer equipment
CN112988698A (en) * 2019-12-02 2021-06-18 阿里巴巴集团控股有限公司 Data processing method and device
CN111309795A (en) * 2020-01-21 2020-06-19 北京百度网讯科技有限公司 Service abnormity positioning method, device, electronic equipment and medium
CN112711591A (en) * 2020-12-31 2021-04-27 天云融创数据科技(北京)有限公司 Data blood margin determination method and device based on field level of knowledge graph
CN114722075A (en) * 2021-01-04 2022-07-08 中国移动通信集团山东有限公司 Data stream processing method and device, server and storage medium
CN112817984A (en) * 2021-02-22 2021-05-18 杭州数梦工场科技有限公司 Data processing method and device, and data source obtaining method and device
CN112817984B (en) * 2021-02-22 2023-10-20 杭州数梦工场科技有限公司 Data processing method and device, and data source acquisition method and device
CN114547314A (en) * 2022-04-25 2022-05-27 北京安华金和科技有限公司 Data classification and classification method and system based on master-slave table
CN114547314B (en) * 2022-04-25 2022-07-05 北京安华金和科技有限公司 Data classification and classification method and system based on master-slave table

Also Published As

Publication number Publication date
CN107291672B (en) 2020-11-20

Similar Documents

Publication Publication Date Title
CN107291672A (en) The treating method and apparatus of tables of data
CN108108821A (en) Model training method and device
CN104717124B (en) A kind of friend recommendation method, apparatus and server
CN110222880B (en) Service risk determining method, model training method and data processing method
CN111966904B (en) Information recommendation method and related device based on multi-user portrait model
CN108197532A (en) The method, apparatus and computer installation of recognition of face
CN108090208A (en) Fused data processing method and processing device
CN108959516B (en) Conversation message treating method and apparatus
CN108764375B (en) Highway goods stock transprovincially matching process and device
CN108898476A (en) A kind of loan customer credit-graded approach and device
CN110310114A (en) Object classification method, device, server and storage medium
CN111797320B (en) Data processing method, device, equipment and storage medium
CN108345601A (en) Search result ordering method and device
CN110874744A (en) Data anomaly detection method and device
CN109190646A (en) A kind of data predication method neural network based, device and nerve network system
CN110019519A (en) Data processing method, device, storage medium and electronic device
CN112839014A (en) Method, system, device and medium for establishing model for identifying abnormal visitor
CN105323763B (en) A kind of recognition methods of junk short message and device
CN107451249B (en) Event development trend prediction method and device
CN107767155B (en) Method and system for evaluating user portrait data
CN106326263B (en) The method and apparatus for obtaining the matching relationship between data
CN112925899B (en) Ordering model establishment method, case clue recommendation method, device and medium
CN108335008A (en) Web information processing method and device, storage medium and electronic device
CN106227661A (en) Data processing method and device
CN112508654A (en) Product information recommendation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant