CN112182116A - Data probing method and device - Google Patents

Data probing method and device Download PDF

Info

Publication number
CN112182116A
CN112182116A CN202010980079.4A CN202010980079A CN112182116A CN 112182116 A CN112182116 A CN 112182116A CN 202010980079 A CN202010980079 A CN 202010980079A CN 112182116 A CN112182116 A CN 112182116A
Authority
CN
China
Prior art keywords
field
target
data table
type
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010980079.4A
Other languages
Chinese (zh)
Inventor
叶宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010980079.4A priority Critical patent/CN112182116A/en
Publication of CN112182116A publication Critical patent/CN112182116A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the specification provides a data exploration method and a data exploration device, which are applied to the field of supervision or compliance. The method comprises the following steps: determining the data table type of a target data table to be probed according to a set classification rule; determining target field characteristics corresponding to target fields needing to be probed in a target data table; based on the target field characteristics and a pre-trained data exploration model, exploring a source field corresponding to a target field from a source data table belonging to the same type as the target data table; the data exploration model is obtained by training based on the incidence relation between the field characteristics of the field to be targeted and the field characteristics of the corresponding field at the source end.

Description

Data probing method and device
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data probing method and apparatus.
Background
With the continuous development of information technology, data information has entered the big data era, and the execution of each business has stronger and stronger dependency on data. For example, for a business result table in a business domain, it may be processed based on one or more source data tables in the data mart layer. In some scenarios, such as data supervision or data compliance, a source end of the service data table needs to be probed.
However, for a business data table, it may go through multiple processes on the basis of the source data table acquired by the data mart layer. Therefore, how to probe the source table of the service data table becomes a technical problem which needs to be solved at present.
Disclosure of Invention
An embodiment of the present specification provides a data exploration method, including: and determining the data table type of the target data table needing to be probed according to the set classification rule. And determining the target field characteristics corresponding to the target fields needing to be probed in the target data table. And determining a source field corresponding to the target field from a source data table belonging to the same type as the target data table based on the target field characteristics and a pre-trained data exploration model. The data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
An embodiment of the present specification further provides a data probing apparatus, including: the first determining module determines the data table type of the target data table to be probed according to the set classification rule. And the second determining module is used for determining the target field characteristics corresponding to the target fields needing to be probed in the target data table. And the probing module is used for determining a source field corresponding to the target field from a source data table belonging to the same type as the target data table based on the target field characteristics and a pre-trained data probing model. The data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
The embodiment of the specification also provides data probing equipment, which comprises a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: and determining the data table type of the target data table needing to be probed according to the set classification rule. And determining the target field characteristics corresponding to the target fields needing to be probed in the target data table. And based on the target field characteristics and a pre-trained data exploration model, exploring the source field corresponding to the target field from a source data table belonging to the same type as the target data table. The data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
Embodiments of the present specification also provide a storage medium for storing computer-executable instructions, which when executed implement the following processes: and determining the data table type of the target data table needing to be probed according to the set classification rule. And determining the target field characteristics corresponding to the target fields needing to be probed in the target data table. And based on the target field characteristics and a pre-trained data exploration model, exploring the source field corresponding to the target field from a source data table belonging to the same type as the target data table. The data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a diagram of a data surface hierarchy from a source data table to a target data table in a data exploration method provided in an embodiment of the present specification;
fig. 2 is a first flowchart of a data exploration method provided in an embodiment of the present disclosure;
fig. 3 is a second flowchart of a data exploration method provided in an embodiment of the present disclosure;
fig. 4 is a schematic block diagram of a data probing apparatus provided in an embodiment of the present specification;
fig. 5 is a schematic structural diagram of a data probing apparatus provided in an embodiment of this specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in this document, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art without making any inventive step based on the embodiments in this description shall fall within the scope of protection of this document.
The idea of the embodiment of the specification is that a data exploration model is trained based on the incidence relation of the field characteristics of the fields in the target end data table and the source end data table, so that when the fields in the target end data table corresponding to the source end data table are explored, the source fields corresponding to the fields can be directly explored by the trained data exploration model according to the field characteristics of the fields to be explored, exploration in a layer-by-layer tracing mode on the target data table based on the blood relationship is avoided, the time consumption is short, and the efficiency is remarkably improved; the automatic exploration of the target-side data table is realized based on a data exploration model, manual intervention is not needed for analysis, and a large amount of labor can be saved; and moreover, the data table of the target end is probed through the data probing model, so that the probing mode is more objective. Based on this, this specification embodiment provides a data exploration method and device. The following is a detailed description.
First, to facilitate understanding of the methods provided by the embodiments of the present specification, the following will describe a data surface hierarchy of a source data table to a target data table. As shown in fig. 1, the service result table in the service domain is processed on the basis of the source-side data table in the data mart layer, and finally, a service result table of the target side is generated. Wherein, several rounds of processing are carried out on the source data table, and then several intermediate data tables are generated. That is, for the source-end data table in the data mart layer, the target-end service result table is finally obtained through at least one middle-layer data table. By adopting the method provided by the embodiment of the specification, the fields in the source end data table in the data mart layer corresponding to the fields in the service data table can be directly detected without detecting the middle layer data table layer by layer.
A specific application scenario of the method provided in the embodiment of this specification may be, in the field of data supervision, probing a source data table of data to be supervised. Or, the method can also be applied to the field of data compliance, and probe of target-end data is carried out during data compliance audit.
It should be noted that the data exploration method provided in the embodiment of the present specification may be applied to a terminal device with computing capability, such as a computer, a tablet computer, and the like; or, the method can also be applied to a service platform which needs to perform data exploration, such as a data supervision platform.
Fig. 2 is a first flowchart of a data probing method provided in an embodiment of the present disclosure, and as shown in fig. 2, the method at least includes the following steps:
step 202, determining the data table type of the target data table to be probed according to the set classification rule.
Optionally, in a specific embodiment, the data table may be subjected to type division in advance according to a certain dimension, that is, all data table types corresponding to the data table are determined. For example, in one particular dimension of division, the data table types may include order tables, product tables, user information tables, and the like. Of course, the data table type may also be determined according to other dimensions, which are only exemplary and not limiting to the embodiments of the present specification.
Therefore, in the implementation of step 202, the data table type to which the target data table belongs may be determined from a plurality of data table types obtained by dividing in advance. Specifically, the set classification rule may be a field semantic matching manner.
In the embodiment of the present specification, the data table type to which the target data table belongs is determined, so that the target data table can be probed from the source data table which belongs to the same type as the target data table, that is, the range for performing data probing is reduced, thereby reducing the workload of subsequent probing and further improving the data probing efficiency.
Step 204, determining the target field characteristics corresponding to the target fields needing to be probed in the target data table.
Optionally, in a specific embodiment, a certain field, some fields, or all fields in the target data table may be probed based on actual requirements. For example, in the technical field of data supervision, if a target data table to be supervised includes five fields, namely field 1, field 2, field 3, field 4 and field 5, and currently, field 3 in the target data table needs to be supervised, in this application scenario, a source field corresponding to probe field 3 needs to be supervised.
Specifically, the target field characteristics may be semantic characteristics of a field, content characteristics of content corresponding to the field, statistical characteristics of content corresponding to the field, and the like. The specific content of the above target field features may be determined based on specific fields, which is not limited by the embodiments of the present specification.
And step 206, based on the target field characteristics and the pre-trained data exploration model, exploring the source field corresponding to the target field from the source data table belonging to the same type as the data table to be targeted.
The data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
Specifically, the above association relationship may be understood as that the field characteristics are equal, or there is some operation relationship, such as summation or multiple relationship. In fact, it can be understood that the data exploration model is mainly generated based on the process of converting the source field into the target field or the conversion condition. For example, in an embodiment, the value of the field corresponding to the source field is divided by a set value to obtain the value of the field corresponding to the destination field, and in this case, the data exploration model is such that the value obtained by multiplying the destination field by a set multiple and the value of the field corresponding to the source field satisfy an equal relationship.
According to the data exploration method provided by the embodiment of the specification, the data exploration model is trained based on the incidence relation of the field characteristics of the fields in the target end data table and the source end data table, so that when the fields in the target end data table are explored in the fields corresponding to the source end data table, the trained data exploration model can be adopted to directly explore the source fields corresponding to the fields according to the field characteristics of the fields to be explored, exploration in a layer-by-layer source tracing mode on the target data table based on the blood-related relation is avoided, the time consumption is short, and the efficiency is remarkably improved; the automatic exploration of the target-side data table is realized based on a data exploration model, manual intervention is not needed for analysis, and a large amount of labor can be saved; and moreover, the data table of the target end is probed through the data probing model, so that the probing mode is more objective.
In order to facilitate understanding of the methods provided by the embodiments of the present disclosure, the following detailed description will discuss specific implementation processes of the above steps.
Optionally, in this embodiment of the present specification, because the source field corresponding to the target field needs to be determined from the source data table belonging to the same type as the target data table, all the source data tables need to be classified in advance, so as to obtain the source data table set corresponding to each data table type. For example, if the data table types include data table type a, data table type B, and data table type C, in a specific implementation, the source data table needs to be divided into a source data table set corresponding to the data table type a, a source data set corresponding to the data table type B, and a source data table set corresponding to the data table type C.
Therefore, before the step 202 is executed, that is, before determining the type of the data table to which the target data to be probed belongs according to the set classification rule, the method provided in the embodiment of the present specification further includes:
and classifying the source data tables according to set classification rules to obtain a source data table set corresponding to each data table type.
In the embodiment of the present specification, the source data table is divided into a plurality of sets according to the data table type corresponding to the source data table in advance, so that when performing data exploration on the target data table, only the source data set having the same data table type as the data table to which the target data table belongs needs to be explored, and therefore, the workload of data exploration can be reduced, the data exploration period can be further shortened, and the data exploration efficiency can be improved.
For example, in a specific embodiment, if the data table type corresponding to the target data table is the data table type B, it is only necessary to perform the probing of the target field in the target data table in the source data table type set corresponding to the data table type B, so that the probing in the source data table sets corresponding to other data table types is omitted.
In the embodiment of the present specification, the type of the data table to which the target data table belongs is determined, and is the same as the classification rule adopted for classifying the source data table. Therefore, in the embodiments of the present specification, a specific implementation process of one of the above is described as an example, and details of another specific implementation process are not described again.
Optionally, in a specific embodiment, the type of the data table to which the target data table to be probed belongs is determined according to a set classification rule, which may specifically be implemented through the following processes:
matching the field names of all fields in the target data table with type keywords corresponding to all data table types; and determining the data table type of the target data table according to the matching result.
Optionally, in a specific embodiment, the type key corresponding to each data table type may be set based on a field generally included in the data table type. For example, for the transaction table, it is usually necessary to include related field information such as an order number, a transaction amount, a transaction time, and a transaction account number, and therefore, a type keyword corresponding to the transaction table may be set based on the related field information, for example, the set type keyword may include an order number, an amount, a time, an account number, a transaction, and the like. Of course, the description is only exemplary and should not be construed as limiting the embodiments of the present disclosure.
In this embodiment of the present specification, the field names of the fields in the target data table are matched with the type keywords corresponding to the types of the data table, where the number of the keywords that are the same as the field names in the target data table or similar to the semantics in the type keywords corresponding to the types of the data table may be matched, and then the data table type with the largest number of the keywords is determined as the data table type to which the target data table belongs.
For ease of understanding, the following description will be given by way of example.
For example, in one embodiment, it is assumed that there are three data table types, denoted as data table type a, data table type B, and data table type C;
the type key words corresponding to the data table type A comprise: keyword 1, keyword 2, and keyword 3;
the type key words corresponding to the data table type B comprise: keywords 11, 12, 13, and 14;
the type key corresponding to the data table type C comprises the following types: keywords 111, 112, 113, and 114;
the field names of the fields contained in the target data table are respectively as follows: field 1, field 2, and field 3;
matching the field 1, the field 2 and the field 3 with type keywords corresponding to the data table type A, the data table type B and the data table type C respectively; if the matching result indicates that the number of fields matched with the data table type A (the field names are the same as the type keywords or similar in semantics) in each field corresponding to the target data table is 1, the number of fields matched with the data table type B in each field corresponding to the target data table is 3, and the number of fields matched with the data table type C in each field corresponding to the target data table is 0. Therefore, according to the matching result, the number of the fields matched with the type key word corresponding to the data table type B in the fields corresponding to the target data table is the largest, that is, the matching degree between the fields corresponding to the target data table and the type key word corresponding to the data table type B is the highest. Therefore, the data table type to which the target data table belongs can be determined to be the data table type B.
Optionally, in a specific embodiment, the data exploration model includes exploration rules corresponding to each field type, where the exploration rules are used to describe matching rules that are satisfied by field features of fields having an association relationship;
correspondingly, in the step 206, the method for searching the source field corresponding to the target field from the source data table belonging to the same type as the target data table based on the target field characteristics and the pre-trained data search model specifically includes the following steps 2062, 2064 and 2066, as shown in fig. 3.
Step 2062, determining a target exploration rule corresponding to a target field in the data exploration model according to the field type corresponding to the target field;
step 2064, matching the target field characteristics with the field characteristics corresponding to each source field of the same field type in the source data table of the same type to obtain the matching results corresponding to the target field and each source field of the same field type in the source data table of the same type;
step 2066, determining the source field corresponding to the matching result meeting the target detection rule as the source field corresponding to the target field.
The field type may include a numeric field, a string field, an enumeratable field, or a status field. Correspondingly, the data exploration model includes exploration rules corresponding to numeric fields, exploration rules corresponding to string fields, exploration rules corresponding to enumerable fields, and exploration rules corresponding to state fields.
In order to facilitate understanding of the methods provided by the embodiments of the present specification, the following will exemplarily describe the probing rules corresponding to the above-mentioned field types. Of course, the description is only exemplary for the convenience of understanding, and does not limit the probing rules mentioned in the embodiments of the present specification.
For example, in one embodiment, for a string-type field, the field value corresponding to the field is a string, and therefore, for the field of the type, the set probing rule may be that the field value corresponding to the target field is equal to (or the same as) the field value corresponding to the source field; for a numerical field, such as a transaction amount, a transaction number, etc., a field value corresponding to the field is a specific numerical value, and for the field of the type, unit conversion (the unit conversion may involve a multiple-multiplication-division operation), an addition-subtraction operation, a statistical operation, etc. may be performed on a source field in a process of obtaining a target field from the source field, and therefore, for the field of the type, a set probing rule may be set such that the target field and the source field satisfy a set multiple relation, or a sum or a difference is a set numerical value, etc.; for a field of a type of state, a field value corresponding to the field is generally a state value, and in general, the field value corresponding to the field may be a corresponding state value only when triggered by a certain trigger condition.
For example, the target field is a transaction state, and the transaction state is a success state when the transaction time and the transaction amount meet set conditions; therefore, the set probing rule can enable the transaction time and the transaction amount corresponding to the same transaction state in the target field and the source field to meet the same set conditions.
Of course, the above description is only exemplary to list several possible probing rules, and does not limit the embodiments of the present specification.
Accordingly, in step 2064, the target field characteristic is matched with the field characteristic corresponding to each source field of the same field type in the source data table of the same type, and whether the field characteristics are the same or not may be compared, or whether some operation (such as addition or subtraction) is performed on the field characteristics to determine whether the operation result satisfies a certain condition or not. The specific matching process may be decided based on the field type.
Optionally, in a specific embodiment, in the step 204, determining a target field characteristic corresponding to a target field that needs to be probed in the target data table specifically includes:
determining the field type of the target field; and determining the target field characteristics corresponding to the target field based on the field characteristic determination rule corresponding to the field type.
The field types comprise a numerical value type field, a character string type field, an enumeratable type field or a state type field.
Of course, in the embodiment of the present specification, the field type is only divided according to one dimension, and in a specific implementation, the field type may also be divided according to other dimensions, which is only exemplary and does not limit the embodiment of the present specification. In addition, it should be noted that the embodiments in this specification merely list several possible field types, the listed field types do not represent all field types, the field types may also include others, and no one of the embodiments in this specification is further listed.
Optionally, in a specific embodiment, determining the target field characteristic corresponding to the target field based on the field characteristic determination rule corresponding to the field type may include any one of the following determination manners:
determining each field value corresponding to the target field as a target field characteristic;
determining the trigger condition corresponding to each field value corresponding to the target field as the target field characteristic;
and taking the appointed operation result of each field value corresponding to the target field as the target field characteristic.
For example, in one embodiment, for a field of a string type, a field value corresponding to the field may be determined as a field characteristic corresponding to the field; for a state type field, determining a trigger condition corresponding to a field value corresponding to the field as a field characteristic corresponding to the field; for a numerical field, a specified operation result (such as a summation operation, a multiple operation, etc.) of a field value corresponding to the field may be determined as a field characteristic corresponding to the field; for example, for a field of an enumerable type, the field value corresponding to the field may be determined as the field characteristic corresponding to the field.
Of course, specific determination manners of field characteristics corresponding to each type of field are only exemplified here, and do not constitute a limitation on the embodiments of the present specification.
In addition, in this embodiment of the present specification, the trigger condition corresponding to the field value may be a field value corresponding to one or more other fields in the target data table.
Optionally, in a specific implementation, the method provided in the embodiments of the present specification may be applied to the technical field of supervision or the technical field of compliance.
For example, if the method provided in the embodiments of the present specification is applied to the technical field of compliance, the business result table may be selected from data fields of users, merchants, institutions, finance, products, and the like, the financial behavior may be derived from change of subject information, fund information, transaction information, and the like, and the environment where the financial behavior is located may specifically be a source, a geographic location, or a behavior context, and the like. Of course, the description is only exemplary and should not be construed as limiting the embodiments of the present disclosure.
The data exploration method provided by the embodiment of the specification at least has the following beneficial effects: the data exploration model is trained on the basis of the incidence relation of the field characteristics of the fields in the target end data table and the source end data table, so that when the fields in the target end data table are explored in the fields corresponding to the source end data table, the source fields corresponding to the fields can be directly explored by the trained data exploration model according to the field characteristics of the fields to be explored, exploration of the target data table in a layer-by-layer tracing mode based on the blood-related relation is avoided, time consumption is short, and efficiency is remarkably improved; the automatic exploration of the target-side data table is realized based on a data exploration model, manual intervention is not needed for analysis, and a large amount of labor can be saved; moreover, the data table of the target end is probed through a data probing model, so that the probing mode is more objective; when data exploration is carried out, the data table type of the target data table is determined, so that the target data table can be explored from the source data table which belongs to the same type as the target data table, namely, the data exploration range is narrowed, the workload of subsequent exploration can be reduced, and the data exploration efficiency is improved.
On the basis of the same idea, corresponding to the method provided by the embodiment of the present specification, the embodiment of the present specification further provides a data exploration apparatus for executing the method provided by the embodiment of the present specification. Fig. 4 is a schematic block diagram of a data probing apparatus provided in an embodiment of the present disclosure, and as shown in fig. 4, the apparatus at least includes:
a first determining module 402, configured to determine, according to a set classification rule, a data table type to which a target data table that needs to be probed belongs;
a second determining module 404, configured to determine a target field characteristic corresponding to a target field that needs to be probed in the target data table;
a probing module 406, configured to probe a source field corresponding to the target field from a source data table belonging to the same type as the target data table based on the target field characteristics and a pre-trained data probing model;
the data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
The apparatus provided in the embodiments of this specification may implement all the method steps in the method embodiments shown in fig. 1 to 3, and therefore, the functions that can be implemented by the apparatus and the specific implementation processes corresponding to the functions corresponding to the modules may refer to the embodiments shown in fig. 1 to 3, which are not described herein again.
The data exploration device provided by the embodiment of the specification at least has the following beneficial effects: the data exploration model is trained on the basis of the incidence relation of the field characteristics of the fields in the target end data table and the source end data table, so that when the fields in the target end data table are explored in the fields corresponding to the source end data table, the source fields corresponding to the fields can be directly explored by the trained data exploration model according to the field characteristics of the fields to be explored, exploration of the target data table in a layer-by-layer tracing mode based on the blood-related relation is avoided, time consumption is short, and efficiency is remarkably improved; the automatic exploration of the target-side data table is realized based on a data exploration model, manual intervention is not needed for analysis, and a large amount of labor can be saved; moreover, the data table of the target end is probed through a data probing model, so that the probing mode is more objective; when data exploration is carried out, the data table type of the target data table is determined, so that the target data table can be explored from the source data table which belongs to the same type as the target data table, namely, the data exploration range is narrowed, the workload of subsequent exploration can be reduced, and the data exploration efficiency is improved.
Further, based on the methods shown in fig. 1 to fig. 3, the present specification embodiment also provides a data probing apparatus, as shown in fig. 5.
The data probing apparatus may have a large difference due to different configurations or performances, and may include one or more processors 501 and a memory 502, where the memory 502 may store one or more stored applications or data. Memory 502 may be, among other things, transient or persistent storage. The application program stored in memory 502 may include one or more modules (not shown), each of which may include a sequence of computer-executable instruction information for the data exploration apparatus. Still further, the processor 501 may be arranged in communication with the memory 502 to execute a series of computer-executable instruction information in the memory 502 on the data probing apparatus. The data probing apparatus may also include one or more power supplies 503, one or more wired or wireless network interfaces 504, one or more input-output interfaces 505, one or more keyboards 506, etc.
In one particular embodiment, the data probing apparatus comprises a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may comprise one or more modules, and each module may comprise a series of computer-executable instruction information for the data probing apparatus, and the one or more programs configured to be executed by the one or more processors comprise computer-executable instruction information for:
determining the data table type of a target data table to be probed according to a set classification rule;
determining target field characteristics corresponding to target fields needing to be probed in the target data table;
based on the target field characteristics and a pre-trained data exploration model, exploring a source field corresponding to the target field from a source data table belonging to the same type as the target data table;
the data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
Therefore, a specific implementation process corresponding to the functions that can be implemented by the device may refer to the embodiment shown in fig. 1 to 3, and details are not repeated here.
The data probing device provided by the embodiment of the specification has at least the following beneficial effects: the data exploration model is trained on the basis of the incidence relation of the field characteristics of the fields in the target end data table and the source end data table, so that when the fields in the target end data table are explored in the fields corresponding to the source end data table, the source fields corresponding to the fields can be directly explored by the trained data exploration model according to the field characteristics of the fields to be explored, exploration of the target data table in a layer-by-layer tracing mode based on the blood-related relation is avoided, time consumption is short, and efficiency is remarkably improved; the automatic exploration of the target-side data table is realized based on a data exploration model, manual intervention is not needed for analysis, and a large amount of labor can be saved; moreover, the data table of the target end is probed through a data probing model, so that the probing mode is more objective; when data exploration is carried out, the data table type of the target data table is determined, so that the target data table can be explored from the source data table which belongs to the same type as the target data table, namely, the data exploration range is narrowed, the workload of subsequent exploration can be reduced, and the data exploration efficiency is improved.
Further, based on the methods shown in fig. 1 to fig. 3, in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and when executed by a processor, the storage medium stores computer-executable instruction information that implements the following processes:
determining the data table type of a target data table to be probed according to a set classification rule;
determining target field characteristics corresponding to target fields needing to be probed in the target data table;
based on the target field characteristics and a pre-trained data exploration model, exploring a source field corresponding to the target field from a source data table belonging to the same type as the target data table;
the data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
As for the computer-executable instruction information stored in the storage medium provided in the embodiment of the description, when being executed by the processor, all the method steps in the method embodiments shown in fig. 1 to 3 may be implemented, and therefore, a specific implementation process corresponding to a function that can be implemented by the computer-executable instruction information stored in the storage medium when being executed by the processor may refer to the embodiment shown in fig. 1 to 3, and is not described herein again.
The storage medium provided by the embodiments of the present specification stores computer executable instruction information, which when executed by a processor, has at least the following beneficial effects: the data exploration model is trained on the basis of the incidence relation of the field characteristics of the fields in the target end data table and the source end data table, so that when the fields in the target end data table are explored in the fields corresponding to the source end data table, the source fields corresponding to the fields can be directly explored by the trained data exploration model according to the field characteristics of the fields to be explored, exploration of the target data table in a layer-by-layer tracing mode based on the blood-related relation is avoided, time consumption is short, and efficiency is remarkably improved; the automatic exploration of the target-side data table is realized based on a data exploration model, manual intervention is not needed for analysis, and a large amount of labor can be saved; moreover, the data table of the target end is probed through a data probing model, so that the probing mode is more objective; when data exploration is carried out, the data table type of the target data table is determined, so that the target data table can be explored from the source data table which belongs to the same type as the target data table, namely, the data exploration range is narrowed, the workload of subsequent exploration can be reduced, and the data exploration efficiency is improved.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instruction information. These computer program instruction information may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instruction information executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instruction information may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instruction information stored in the computer-readable memory produce an article of manufacture including instruction information means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instruction information may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instruction information executed on the computer or other programmable apparatus provides steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instruction information, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instruction information, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (14)

1. A method of data exploration, the method comprising:
determining the data table type of a target data table to be probed according to a set classification rule;
determining target field characteristics corresponding to target fields needing to be probed in the target data table;
based on the target field characteristics and a pre-trained data exploration model, exploring a source field corresponding to the target field from a source data table belonging to the same type as the target data table;
the data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
2. The method according to claim 1, wherein the data probing model includes probing rules corresponding to each field type, and the probing rules are used to describe matching relationships satisfied between field features of fields having association relationships;
the method for exploring the source field corresponding to the target field from the source data table belonging to the same type as the target data table based on the target field characteristics and a pre-trained data exploration model comprises the following steps:
determining a target exploration rule corresponding to the target field in the data exploration model according to the field type of the target field;
matching the target field characteristics with field characteristics corresponding to source fields of the same field type in a source data table of the same type to obtain matching results corresponding to the target field and source fields of the same field type in the source data table of the same type;
and determining the source field corresponding to the matching result meeting the target probing rule as the source field corresponding to the target field.
3. The method according to claim 1, wherein the determining the target field characteristics corresponding to the target fields needing to be probed in the target data table comprises:
determining the field type of the target field; the field type comprises any one of a numeric field, a character string type field, an enumeratable field or a state type field;
and determining the target field characteristics corresponding to the target field based on the field characteristic determination rule corresponding to the field type.
4. The method according to claim 3, wherein the determining, based on the field characteristic determination rule corresponding to the field type, a target field characteristic corresponding to the target field includes any one of the following determination manners:
determining each field value corresponding to the target field as the target field characteristic;
determining the triggering conditions corresponding to the field values corresponding to the target fields as the characteristics of the target fields;
and taking the appointed operation result of each field value corresponding to the target field as the target field characteristic.
5. The method according to claim 1, before determining the type of the data table to which the target data table to be probed belongs according to the set classification rule, the method further comprises:
and classifying the source data table according to the classification rule to obtain a source data table set corresponding to each data table type.
6. The method according to claim 1, wherein the determining the type of the data table to which the target data table to be probed belongs according to the set classification rule includes:
matching the field names of all fields in the target data table with type keywords corresponding to all the data table types;
and determining the data table type of the target data table according to the matching result.
7. An apparatus for data exploration, the apparatus comprising:
the first determining module is used for determining the data table type of a target data table to be probed according to a set classification rule;
the second determining module is used for determining the target field characteristics corresponding to the target fields needing to be probed in the target data table;
the probing module probes a source field corresponding to the target field from a source data table which belongs to the same type as the target data table on the basis of the target field characteristics and a pre-trained data probing model;
the data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
8. The apparatus according to claim 7, wherein the data exploration model includes exploration rules corresponding to respective field types, and the exploration rules are used for describing matching relationships satisfied between field features of fields having association relationships;
correspondingly, the probing module comprises:
the first determining unit is used for determining a target exploration rule corresponding to the target field in the data exploration model according to the field type of the target field;
the first matching unit is used for matching the target field characteristics with the field characteristics corresponding to each source field of the same field type in the source data table of the same type to obtain matching results corresponding to the target field and each source field of the same field type in the source data table of the same type;
and the second determining unit is used for determining the source field corresponding to the matching result meeting the target probing rule as the source field corresponding to the target field.
9. The apparatus of claim 7, the second determination module, comprising:
a third determining unit, for determining the field type of the target field; the field type comprises any one of a numeric field, a character string type field, an enumeratable field or a state type field;
and the fourth determining unit is used for determining the target field characteristics corresponding to the target fields based on the field characteristic determining rules corresponding to the field types.
10. The apparatus of claim 9, the fourth determination unit comprising any one of:
the first determining subunit determines each field value corresponding to the target field as the target field characteristic;
a second determining subunit, configured to determine, as the target field feature, a trigger condition corresponding to each field value corresponding to the target field;
and the third determining subunit takes the specified operation result of each field value corresponding to the target field as the target field characteristic.
11. The apparatus of claim 7, further comprising:
and the dividing module is used for carrying out category division on the source data table according to the classification rule to obtain a source data table set corresponding to each data table type.
12. The apparatus of claim 7, the first determination module, comprising:
the second matching unit is used for matching the field names of all the fields in the target data table with the type keywords corresponding to all the data table types;
and the fifth determining unit is used for determining the data table type of the target data table according to the matching result.
13. A data exploration apparatus, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
determining the data table type of a target data table to be probed according to a set classification rule;
determining target field characteristics corresponding to target fields needing to be probed in the target data table;
based on the target field characteristics and a pre-trained data exploration model, exploring a source field corresponding to the target field from a source data table belonging to the same type as the target data table;
the data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
14. A storage medium storing computer-executable instructions that, when executed, implement the following:
determining the data table type of a target data table to be probed according to a set classification rule;
determining target field characteristics corresponding to target fields needing to be probed in the target data table;
based on the target field characteristics and a pre-trained data exploration model, exploring a source field corresponding to the target field from a source data table belonging to the same type as the target data table;
the data exploration model is obtained by training based on the incidence relation between the field characteristics of the target end field and the field characteristics of the corresponding source end field.
CN202010980079.4A 2020-09-17 2020-09-17 Data probing method and device Pending CN112182116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010980079.4A CN112182116A (en) 2020-09-17 2020-09-17 Data probing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010980079.4A CN112182116A (en) 2020-09-17 2020-09-17 Data probing method and device

Publications (1)

Publication Number Publication Date
CN112182116A true CN112182116A (en) 2021-01-05

Family

ID=73920031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010980079.4A Pending CN112182116A (en) 2020-09-17 2020-09-17 Data probing method and device

Country Status (1)

Country Link
CN (1) CN112182116A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032494A (en) * 2021-03-08 2021-06-25 浙江大华技术股份有限公司 Data table classification and model training method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881427A (en) * 2015-04-01 2015-09-02 北京科东电力控制系统有限责任公司 Data blood relationship analyzing method for power grid regulation and control running
US20160070724A1 (en) * 2014-09-08 2016-03-10 International Business Machines Corporation Data quality analysis and cleansing of source data with respect to a target system
CN111209538A (en) * 2020-01-03 2020-05-29 北京明略软件系统有限公司 Table data quality probing method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160070724A1 (en) * 2014-09-08 2016-03-10 International Business Machines Corporation Data quality analysis and cleansing of source data with respect to a target system
CN104881427A (en) * 2015-04-01 2015-09-02 北京科东电力控制系统有限责任公司 Data blood relationship analyzing method for power grid regulation and control running
CN111209538A (en) * 2020-01-03 2020-05-29 北京明略软件系统有限公司 Table data quality probing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王芳;赵洪;等: "数据科学视角下数据溯源研究与实践进展", 中国图书馆学报, no. 5, 7 November 2019 (2019-11-07), pages 79 - 100 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032494A (en) * 2021-03-08 2021-06-25 浙江大华技术股份有限公司 Data table classification and model training method, device, equipment and medium

Similar Documents

Publication Publication Date Title
TWI718643B (en) Method and device for identifying abnormal groups
CN110162796B (en) News thematic creation method and device
CN108346107B (en) Social content risk identification method, device and equipment
CN110162778B (en) Text abstract generation method and device
CN112199416A (en) Data rule generation method and device
CN115774552A (en) Configurated algorithm design method and device, electronic equipment and readable storage medium
CN109656946B (en) Multi-table association query method, device and equipment
US20070214107A1 (en) Dynamic materialized view ranging
CN114372566A (en) Augmentation of graph data, graph neural network training method, device and equipment
CN112182116A (en) Data probing method and device
CN107577660B (en) Category information identification method and device and server
CN111209277B (en) Data processing method, device, equipment and medium
CN116822606A (en) Training method, device, equipment and storage medium of anomaly detection model
CN108681490B (en) Vector processing method, device and equipment for RPC information
CN115934161A (en) Code change influence analysis method, device and equipment
CN110245136B (en) Data retrieval method, device, equipment and storage equipment
CN110321433B (en) Method and device for determining text category
CN109325127B (en) Risk identification method and device
CN109903165B (en) Model merging method and device
CN112925955A (en) Information processing method, device and equipment
CN113344197A (en) Training method of recognition model, service execution method and device
CN111596946A (en) Recommendation method, device and medium for intelligent contracts of block chains
CN111782813A (en) User community evaluation method, device and equipment
CN115017915B (en) Model training and task execution method and device
CN111461352B (en) Model training method, service node identification device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination