CN117609237A - Data table field attribute determining method, device, equipment and storage medium - Google Patents

Data table field attribute determining method, device, equipment and storage medium Download PDF

Info

Publication number
CN117609237A
CN117609237A CN202311643243.2A CN202311643243A CN117609237A CN 117609237 A CN117609237 A CN 117609237A CN 202311643243 A CN202311643243 A CN 202311643243A CN 117609237 A CN117609237 A CN 117609237A
Authority
CN
China
Prior art keywords
attribute
data table
target
target field
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311643243.2A
Other languages
Chinese (zh)
Inventor
刘惠民
李蓉娴
张然
孙琳
廖梦杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311643243.2A priority Critical patent/CN117609237A/en
Publication of CN117609237A publication Critical patent/CN117609237A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for determining field attributes of a data table. Determining attribute identifications of all target fields in the current source data table and attribute statistics times of the attribute identifications according to all source data tables, and whether downstream data tables are corresponding to all target field attribute identifications or attribute identifications; for each target field, if a downstream data table corresponding to the current target field exists, the attribute statistics times of each attribute identifier of the current target field are superimposed to the attribute statistics times of the corresponding attribute identifiers of the corresponding target fields in the corresponding downstream data table; if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as the current target field, and returning to the attribute statistics number superposition step until the downstream table corresponding to the current target field does not exist; and determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table. The method and the device can improve the field attribute labeling efficiency.

Description

Data table field attribute determining method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining a field attribute of a data table.
Background
The attributes of the fields are typically considered when processing the target data table. The prior art generally adopts a manual labeling mode to determine the attribute of each field of the target data table, so that the labor cost is high and the efficiency is low.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for determining field attributes of a data table, which are used for solving the problem of low field attribute labeling efficiency in the existing data processing method.
According to an aspect of the present invention, there is provided a data table field attribute determining method, including:
determining a data table set, wherein the data table set comprises at least one source data table and at least two non-source data tables, and the at least two non-source data tables comprise an intermediate data table and a target data table;
determining attribute identifiers of all target fields in a current source data table, attribute statistics times of the attribute identifiers and whether all the target fields correspond to a downstream data table or not according to all the source data tables;
for each target field, if a downstream data table corresponding to the current target field exists in the at least one non-source data table, the attribute statistics times of each attribute identifier of the current target field are overlapped to the attribute statistics times of the corresponding attribute identifiers of the corresponding target fields in the corresponding downstream data table;
If the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as a current target field, and returning to the step of overlapping the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table until the downstream table corresponding to the current target field does not exist in the at least two non-source data tables;
and determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table.
According to another aspect of the present invention, there is provided a data table field attribute determining apparatus, including:
the data table determining module is used for determining a data table set, wherein the data table set comprises at least one source data table and at least two non-source data tables, and the at least two non-source data tables comprise an intermediate data table and a target data table;
the starting module is used for determining attribute identifiers of all target fields in the current source data table, attribute statistics times of the attribute identifiers and whether all the target fields correspond to a downstream data table or not according to all the source data tables;
The propagation module is used for superposing the attribute statistics times of the attribute identifications of the current target field to the attribute statistics times of the corresponding attribute identifications of the corresponding target fields in the corresponding downstream data table if the downstream data table corresponding to the current target field exists in the at least one non-source data table for each target field;
the iteration module is used for taking the target field as a current target field and returning to the step of overlapping the attribute statistics times of all attribute identifiers of the current target field to the attribute statistics times of the corresponding attribute identifiers of the corresponding target field in the corresponding downstream data table if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table until the downstream tables corresponding to the current target field do not exist in the at least two non-source data tables;
and the determining module is used for determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data table field attribute determination method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a data table field attribute determining method according to any one of the embodiments of the present invention when executed.
According to the technical scheme of the data table field attribute determining method provided by the embodiment of the invention, for each source data table, attribute identifications of each target field in the current source data table, attribute statistics times of the attribute identifications and whether each target field corresponds to a downstream data table or not are determined; for each target field, if at least one non-source data table exists in a downstream data table corresponding to the current target field, the attribute statistics times of each attribute identifier of the current target field are overlapped to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table; if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as a current target field, and returning to the step of overlapping the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table until at least two non-source data tables do not have downstream tables corresponding to the current target field; and determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table. The technical effect of simply, accurately and quickly determining the attribute identification of each target field in the target data table is achieved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for determining attributes of fields of a data table provided in accordance with an embodiment of the present invention;
FIG. 2 is a schematic representation of attribute identification propagation provided in accordance with an embodiment of the present invention;
FIG. 3 is a further flowchart of a method for determining attributes of fields of a data table provided in accordance with an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data table field attribute determining apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing a method for determining attributes of fields in a data table according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the term "object" and the like in the description of the present invention and the claims and the above drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flowchart of a method for determining attributes of fields in a data table according to an embodiment of the present invention, where the method may be performed by a data table field attribute determining device, which may be implemented in hardware and/or software, and the data table field attribute determining device may be configured in a processor of an electronic device, where the method is applicable to determining the attributes of corresponding fields in a target data table based on the attributes of each target field in a source data table. As shown in fig. 1, the method includes:
s110, determining a data table set, wherein the data table set comprises at least one source data table and at least two non-source data tables, and the at least two non-source data tables comprise an intermediate data table and a target data table.
The source data table refers to a data table without a corresponding parent data table.
The non-source data table refers to a data table which is not a source data table, such as an intermediate data table and a target data table.
The target data table is a data table focused by a user, any field or set field does not have a corresponding downstream data table, but at least one field has a corresponding upstream data table.
The intermediate data table is a data table connecting the source data table and the target data table. For any one of the source data table and the target data table, the number of intermediate data tables connecting the source data table and the target data table is one or more.
S120, determining attribute identifiers of all target fields in a current source data table, attribute statistics times of the attribute identifiers and whether all the target fields correspond to a downstream data table or not according to all the source data tables.
The target field is a field corresponding to the attribute identifier.
The attribute statistics of the attribute identification of the target field in the source data table is 0 or 1. For example, as shown in fig. 2, the data table a is a source data table, and the field a is a target field, which carries the sensitivity identifier in the sensitivity combination, so that the attribute statistics of the sensitivity identifier of the target field is 1, and the attribute statistics of the non-sensitivity identifier is 0.
Specifically, each target field in the source data table carries at least one attribute identifier of an attribute combination, and in each attribute combination, the attribute statistics times of one attribute identifier are 1, and the attribute statistics times of other attribute identifiers are 0.
Each attribute combination comprises at least two attribute identifiers, and the attribute identifiers correspond to the same attribute. For example, a sensitive combination includes a sensitive label and a non-sensitive label.
In one embodiment, each target field carries an attribute identification of at least one attribute combination of a sensitive combination, a security level combination, an execution standard combination, an associated dictionary combination. Optionally, each target field carries attribute identifiers of a sensitive combination, a security level combination, an execution standard combination and an associated dictionary combination, except that in each attribute combination, the attribute statistics of some attribute identifiers is 0, and the attribute statistics of some attribute identifiers is 1.
For each target field, if a field of any one of the at least two non-source data tables is determined based on the current target field, then the non-source table is a downstream table from the current target field.
S130, for each target field, if a downstream data table corresponding to the current target field exists in the at least one non-source data table, the attribute statistics times of each attribute identifier of the current target field are superimposed to the attribute statistics times of the corresponding attribute identifiers of the corresponding target fields in the corresponding downstream data table.
If the current target field has a corresponding downstream table, judging that the corresponding field of the downstream table has blood margin dependency on the current target field, and therefore, transmitting the attribute statistics times of each attribute identifier of the current target field to the corresponding field of the corresponding downstream table in a blood margin relation transmission mode. Specifically, the attribute statistics times of each attribute identifier of the current target field are superimposed to the attribute statistics times of the corresponding attribute identifiers of the corresponding target fields in the corresponding downstream data table.
As shown in fig. 2, the data table a is a source data table, and the field a therein is a current target field and carries a sensitivity identifier. Since data table a is the source data table, the attribute count number of the sensitivity flag is 1. The field A in the data table C is determined based on the current target field, so the data table C is a downstream table of the current target field, if the field A in the data table C has no attribute identification of the sensitive combination, an initialized sensitive combination is added to the field A, the initialized sensitive combination comprises a sensitive identification and a non-sensitive identification, and the attribute statistics times corresponding to the sensitive identification and the attribute statistics times corresponding to the non-sensitive are both 0. And adding the attribute statistics number 1 of the sensitivity marks corresponding to the current target field A to the attribute statistics number of the sensitivity marks in the initialized sensitivity combination so that the attribute statistics number of the sensitivity marks of the field A of the data table C becomes 1 and the attribute statistics number of the non-sensitivity marks is still 0.
And S140, if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as a current target field, and returning to the step of overlapping the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifiers of the corresponding target field in the corresponding downstream data table until the downstream table corresponding to the current target field does not exist in the at least two non-source data tables.
This step aims at propagating the attribute identification of each target field in the source data table to the downstream data table with which each target field is associated by iteration.
As shown in fig. 2, the field a in the data table D and the field a in the data table E are based on the field a of the data table C, and thus the field a in the data table C is taken as the current target field. Both the data table D and the data table E are downstream data tables corresponding to the current target field.
For the data table D, in the case that the field a of the data table D has no sensitive combination, adding an initialized sensitive combination to the field a of the data table D, where the initialized sensitive combination includes a sensitive identifier and a non-sensitive identifier, and the attribute statistics times corresponding to the sensitive identifier and the non-sensitive identifier are both 0, adding the attribute statistics times 1 of the sensitive identifier corresponding to the current target field to the attribute statistics times of the sensitive identifier in the initialized sensitive combination so that the attribute statistics times of the sensitive identifier of the field a of the data table D become 1, the attribute statistics times of the non-sensitive identifier remain 0, and so on so that the attribute statistics times of the sensitive identifiers in the sensitive combinations of the data table F, the data table G, the data table E and the data table N become 1, and the attribute statistics times of the non-sensitive identifier remain 0.
For the data table E, since the field a of the data table E has a sensitive combination, the attribute statistics of the sensitive identifier of the field a of the data table C is 1, and the attribute statistics of the non-sensitive identifier is 0, so that the attribute statistics of the sensitive identifier of the field a of the data table E is added with 1 to make the attribute statistics of the sensitive identifier of the field a become 2, and the attribute statistics of the non-sensitive identifier is still 0. And so on, the attribute statistics of the sensitive marks of the data table N are 2, and the attribute statistics of the non-sensitive marks are 0.
The data table B and the data table T are both source tables, the A field of the data table B carries a non-sensitive identifier, the A field of the data table T carries a sensitive identifier, and the A field of the data table D is determined based on the A field of the data table B and the A field of the data table T in addition to the A field of the data table C, so that the data table D is a downstream table of the A field of the data table B and the A field of the data table T.
Superposing the attribute statistics number 1 of the non-sensitive marks of the field A of the data table B to the attribute statistics number of the non-sensitive marks of the field A of the data table D so that the attribute statistics number of the non-sensitive marks of the field A of the data table D becomes 1, the attribute statistics number of the sensitive marks is 1, updating the attribute statistics number of the non-sensitive marks of the field A in the data table F to 1 according to the step S130, and the attribute statistics number of the sensitive marks is still 1; updating the attribute statistics times of the non-sensitive marks of the field A in the data table G to be 1, wherein the attribute statistics times of the sensitive marks are still 1; updating the attribute statistics times of the non-sensitive marks of the field A in the data table E to be 1, wherein the attribute statistics times of the sensitive marks are still 2; and updating the attribute statistics of the non-sensitive mark of the field A in the data table N to be 1, wherein the attribute statistics of the sensitive mark is still 2.
And superposing the attribute statistics number 1 of the sensitivity marks of the field A of the data table F to the attribute statistics number of the sensitivity marks of the field A of the data table D so that the attribute statistics number of the sensitivity marks of the field A of the data table D becomes 2 and the attribute statistics number of the non-sensitivity marks becomes 1. According to step S130, the attribute statistics of the sensitive marks of the field A in the data table F are updated to 2, and the attribute statistics of the non-sensitive marks are still 1. Updating the attribute statistics times of the sensitive marks of the field A in the data table E to 3, wherein the attribute statistics times of the non-sensitive marks are still 1; and updating the attribute statistics of the sensitive marks of the field A in the data table N to 3, wherein the attribute statistics of the non-sensitive marks is still 1. And updating the attribute statistics of the sensitive marks of the field A in the data table G to 2, wherein the attribute statistics of the non-sensitive marks is still 1.
If the corresponding target field in the corresponding downstream data table does not have a corresponding downstream data table, judging that the propagation path of the attribute identification of the current target field in the current source data table in the non-source data table has been traversed, if the current source data table has a target field which is not traversed by the attribute identification propagation path, taking the next target field as the current target field, and executing S130; if the current source data table does not have the target field for which the attribute identification propagation path traversal is not performed, the next source data table is taken as the current source data table, and S120 is performed.
S150, determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table.
In one embodiment, for each target field in the target data table, determining a current target field, all valid attribute combinations of the current target field, and attribute identifiers corresponding to the maximum statistics in each valid attribute combination; for each effective attribute combination, if the attribute statistics times of only one attribute identifier in the current effective attribute combination is the maximum statistics times, the attribute identifier corresponding to the maximum statistics times is used as one attribute identifier of the current target field.
Wherein the valid attribute combination is an attribute combination including at least one attribute identification for which the attribute count is not 0.
It will be appreciated that the number of attribute identifications of the current target field is the same as the number of valid attribute combinations. The embodiment can simply and quickly determine at least one attribute identifier of the current target field according to the attribute statistics times of each attribute identifier carried by the current target field.
In one embodiment, when it is detected that all valid target fields in the source data table have been traversed, determining, according to the attribute statistics of attribute identifiers of all target fields in the target data table, where the valid target fields are target fields in which corresponding downstream data tables exist. The embodiment ends updating of the attribute statistics of the non-source data table by setting the iteration termination condition.
According to the technical scheme of the data table field attribute determining method provided by the embodiment of the invention, for each source data table, attribute identifications of each target field in the current source data table, attribute statistics times of the attribute identifications and whether each target field corresponds to a downstream data table or not are determined; for each target field, if at least one non-source data table exists in a downstream data table corresponding to the current target field, the attribute statistics times of each attribute identifier of the current target field are overlapped to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table; if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as a current target field, and returning to the step of overlapping the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table until at least two non-source data tables do not have downstream tables corresponding to the current target field; and determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table. The technical effect of simply, accurately and quickly determining the attribute identification of each target field in the target data table is achieved.
Fig. 3 is a flowchart of a method for determining attributes of fields in a data table according to an embodiment of the present invention, where the user further refines S140 in the foregoing embodiment. As shown in fig. 3, the method includes:
s210, determining a data table set, wherein the data table set comprises at least one source data table and at least two non-source data tables, and the at least two non-source data tables comprise an intermediate data table and a target data table.
S220, determining attribute identifiers of all target fields in a current source data table, attribute statistics times of the attribute identifiers and whether all the target fields correspond to a downstream data table or not according to all the source data tables.
S230, for each target field, if a downstream data table corresponding to the current target field exists in the at least one non-source data table, the attribute statistics times of each attribute identifier of the current target field are superimposed to the attribute statistics times of the corresponding attribute identifiers of the corresponding target fields in the corresponding downstream data table.
S240, if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as a current target field, and returning to the step of overlapping the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifiers of the corresponding target field in the corresponding downstream data table until the downstream table corresponding to the current target field does not exist in the at least two non-source data tables.
S2501, determining a current target field, all valid attribute combinations of the current target field and the valid attribute combinations, and attribute identifiers corresponding to the maximum statistics times in the valid attribute combinations, for each target field in the target data table.
And aiming at all the effective attribute combinations of the current target field, carrying out statistics times sequencing on each attribute identifier in the current effective attribute combination, and determining the maximum attribute statistics times and the attribute identifiers corresponding to the maximum attribute statistics times according to sequencing results.
S2502, if the attribute statistics times of at least two attribute identifiers in the current effective attribute combination are the maximum statistics times, taking the attribute identifier with the highest priority in the at least two attribute identifiers as one attribute identifier of the current target field.
If the maximum statistics number corresponds to two attribute identifiers, the two attribute identifiers have equivalent functions corresponding to data in the determining process of the current target field, and the attribute priorities of the two attribute identifiers are required to be acquired at the moment, and the attribute identifier with higher attribute priority in the two attribute identifiers is determined to be used as one attribute identifier of the current target field. And determining a target attribute identifier of each effective attribute combination by combining the attribute statistics times and the attribute priorities, and taking the target attribute identifier as an effective identifier of the current target field.
In one embodiment, for sensitive combinations, the priority of sensitive identifications is higher than the priority of non-sensitive identifications; for the security level combination, if the first security level is higher than the second security level, the priority of the security level identifier corresponding to the first security level is higher than the priority of the security level identifier corresponding to the second security level; for the execution standard combination, if the first execution standard is higher than the second security level, the priority of the execution standard identification corresponding to the first execution standard is higher than the priority of the execution standard identification corresponding to the second execution standard.
The embodiment of the invention adopts a mode of combining attribute statistics times and attribute priorities to determine the target attribute identification of each effective attribute combination, and takes the target attribute identification as one effective identification of the current target field, thereby improving the accuracy of determining the attribute identification of the current target field.
Fig. 4 is a schematic structural diagram of a data table field attribute determining apparatus according to an embodiment of the present invention. As shown in fig. 4, the apparatus includes:
a data table determining module 31, configured to determine a data table set, where the data table set includes at least one source data table and at least two non-source data tables, and the at least two non-source data tables include an intermediate data table and a target data table;
A start module 32, configured to determine, for each of the source data tables, an attribute identifier of each target field in the current source data table, an attribute statistics number of the attribute identifier, and whether each of the target fields corresponds to a downstream data table;
a propagation module 33, configured to, for each of the target fields, if a downstream data table corresponding to a current target field exists in the at least one non-source data table, superimpose an attribute statistics number of each attribute identifier of the current target field onto an attribute statistics number of a corresponding attribute identifier of a corresponding target field in a corresponding downstream data table;
the iteration module 34 is configured to, if the corresponding target field in the corresponding downstream data table further corresponds to a downstream data table, take the target field as a current target field, and return to a step of adding the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifiers of the corresponding target field in the corresponding downstream data table until no downstream tables corresponding to the current target field exist in the at least two non-source data tables;
and the determining module is used for determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table.
In one embodiment, each of the target fields in the source data table carries an attribute identifier of at least one attribute combination, and each of the attribute combinations includes at least two attribute identifiers.
In one embodiment, each of the target fields carries at least one attribute combination of a sensitive combination, a security level combination, an execution standard combination, and an associated dictionary combination.
In one embodiment, the determining module 35 is configured to:
determining a current target field, all effective attribute combinations of the current target field and attribute identifiers corresponding to the maximum statistics times in the effective attribute combinations according to each target field in the target data table;
for each effective attribute combination, if the attribute statistics times of only one attribute identifier in the current effective attribute combination is the maximum statistics times, the attribute identifier corresponding to the maximum statistics times is used as one attribute identifier of the current target field.
In one embodiment, the determining module 35 is further configured to:
and if the attribute statistics times of at least two attribute identifiers in the current effective attribute combination are the maximum statistics times, taking the attribute identifier with the highest priority in the at least two attribute identifiers as one attribute identifier of the current target field.
In one embodiment, for sensitive combinations, the priority of sensitive identifications is higher than the priority of non-sensitive identifications; for the security level combination, if the first security level is higher than the second security level, the priority of the security level identifier corresponding to the first security level is higher than the priority of the security level identifier corresponding to the second security level; for the execution standard combination, if the first execution standard is higher than the second security level, the priority of the execution standard identification corresponding to the first execution standard is higher than the priority of the execution standard identification corresponding to the second execution standard.
In one embodiment, the determining module 35 is specifically configured to:
and under the condition that the fact that all the effective target fields in the source data table are traversed is detected, determining the attribute identifications of all the target fields in each target data table according to the attribute statistics times of the attribute identifications of all the target fields in the target data table, wherein the effective target fields are the target fields with corresponding downstream data tables.
According to the technical scheme of the data table field attribute determining device, for each source data table, attribute identifications of each target field in the current source data table, attribute statistics times of the attribute identifications and whether each target field corresponds to a downstream data table or not are determined; for each target field, if at least one non-source data table exists in a downstream data table corresponding to the current target field, the attribute statistics times of each attribute identifier of the current target field are overlapped to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table; if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as a current target field, and returning to the step of overlapping the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table until at least two non-source data tables do not have downstream tables corresponding to the current target field; and determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table. The technical effect of simply, accurately and quickly determining the attribute identification of each target field in the target data table is achieved.
The data table field attribute determining device provided by the embodiment of the invention can execute the data table field attribute determining method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data table field attribute determination method.
In some embodiments, the data table field attribute determination method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more of the steps of the data table field attribute determination method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data table field attribute determination method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for determining a field attribute of a data table, comprising:
determining a data table set, wherein the data table set comprises at least one source data table and at least two non-source data tables, and the at least two non-source data tables comprise an intermediate data table and a target data table;
determining attribute identifiers of all target fields in a current source data table, attribute statistics times of the attribute identifiers and whether all the target fields correspond to a downstream data table or not according to all the source data tables;
For each target field, if a downstream data table corresponding to the current target field exists in the at least one non-source data table, the attribute statistics times of each attribute identifier of the current target field are overlapped to the attribute statistics times of the corresponding attribute identifiers of the corresponding target fields in the corresponding downstream data table;
if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table, taking the target field as a current target field, and returning to the step of overlapping the attribute statistics times of each attribute identifier of the current target field to the attribute statistics times of the corresponding attribute identifier of the corresponding target field in the corresponding downstream data table until the downstream table corresponding to the current target field does not exist in the at least two non-source data tables;
and determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table.
2. The method of claim 1, wherein each of the target fields in the source data table carries an attribute identification of at least one attribute combination, each of the attribute combinations comprising at least two attribute identifications.
3. The method of claim 2, wherein each of the target fields carries at least one of a sensitivity combination, a security level combination, an execution standard combination, an association dictionary combination.
4. The method of claim 1, wherein determining the attribute identifier of each target field in each target data table according to the attribute statistics of the attribute identifiers of each target field in the target data table comprises:
determining a current target field, all effective attribute combinations of the current target field and attribute identifiers corresponding to the maximum statistics times in the effective attribute combinations according to each target field in the target data table;
for each effective attribute combination, if the attribute statistics times of only one attribute identifier in the current effective attribute combination is the maximum statistics times, the attribute identifier corresponding to the maximum statistics times is used as one attribute identifier of the current target field.
5. The method as recited in claim 4, further comprising:
and if the attribute statistics times of at least two attribute identifiers in the current effective attribute combination are the maximum statistics times, taking the attribute identifier with the highest priority in the at least two attribute identifiers as one attribute identifier of the current target field.
6. The method of claim 5, wherein the step of determining the position of the probe is performed,
for sensitive combinations, the priority of sensitive identifications is higher than the priority of non-sensitive identifications;
for the security level combination, if the first security level is higher than the second security level, the priority of the security level identifier corresponding to the first security level is higher than the priority of the security level identifier corresponding to the second security level;
for the execution standard combination, if the first execution standard is higher than the second security level, the priority of the execution standard identification corresponding to the first execution standard is higher than the priority of the execution standard identification corresponding to the second execution standard.
7. The method of claim 1, wherein determining the attribute identifier of each target field in each target data table according to the attribute statistics of the attribute identifiers of each target field in the target data table comprises:
and under the condition that the fact that all the effective target fields in the source data table are traversed is detected, determining the attribute identifications of all the target fields in each target data table according to the attribute statistics times of the attribute identifications of all the target fields in the target data table, wherein the effective target fields are the target fields with corresponding downstream data tables.
8. A data table field attribute determining apparatus, comprising:
the data table determining module is used for determining a data table set, wherein the data table set comprises at least one source data table and at least two non-source data tables, and the at least two non-source data tables comprise an intermediate data table and a target data table;
the starting module is used for determining attribute identifiers of all target fields in the current source data table, attribute statistics times of the attribute identifiers and whether all the target fields correspond to a downstream data table or not according to all the source data tables;
the propagation module is used for superposing the attribute statistics times of the attribute identifications of the current target field to the attribute statistics times of the corresponding attribute identifications of the corresponding target fields in the corresponding downstream data table if the downstream data table corresponding to the current target field exists in the at least one non-source data table for each target field;
the iteration module is used for taking the target field as a current target field and returning to the step of overlapping the attribute statistics times of all attribute identifiers of the current target field to the attribute statistics times of the corresponding attribute identifiers of the corresponding target field in the corresponding downstream data table if the corresponding target field in the corresponding downstream data table also corresponds to the downstream data table until the downstream tables corresponding to the current target field do not exist in the at least two non-source data tables;
And the determining module is used for determining the attribute identification of each target field in each target data table according to the attribute statistics times of the attribute identifications of each target field in the target data table.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data table field attribute determination method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the data table field attribute determination method of any one of claims 1-7.
CN202311643243.2A 2023-12-04 2023-12-04 Data table field attribute determining method, device, equipment and storage medium Pending CN117609237A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311643243.2A CN117609237A (en) 2023-12-04 2023-12-04 Data table field attribute determining method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311643243.2A CN117609237A (en) 2023-12-04 2023-12-04 Data table field attribute determining method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117609237A true CN117609237A (en) 2024-02-27

Family

ID=89955962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311643243.2A Pending CN117609237A (en) 2023-12-04 2023-12-04 Data table field attribute determining method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117609237A (en)

Similar Documents

Publication Publication Date Title
CN115598505A (en) Chip detection method, device, equipment and storage medium
CN117076719A (en) Database joint query method, device and equipment based on large language model
CN115687406B (en) Sampling method, device, equipment and storage medium for call chain data
CN116303013A (en) Source code analysis method, device, electronic equipment and storage medium
CN115481594B (en) Scoreboard implementation method, scoreboard, electronic equipment and storage medium
CN117609237A (en) Data table field attribute determining method, device, equipment and storage medium
CN112860811B (en) Method and device for determining data blood relationship, electronic equipment and storage medium
CN114172725B (en) Illegal website processing method and device, electronic equipment and storage medium
CN116975653A (en) Sample information determining method and device, electronic equipment and storage medium
CN116070601B (en) Data splicing method and device, electronic equipment and storage medium
CN113254993B (en) Data protection method, apparatus, device, storage medium, and program product
CN113051313B (en) Information aggregation method, apparatus, electronic device, storage medium, and program product
CN113032069B (en) Page switching method and device, electronic equipment and readable storage medium
CN115511014B (en) Information matching method, device, equipment and storage medium
CN116401113B (en) Environment verification method, device and medium for heterogeneous many-core architecture acceleration card
CN118012936A (en) Data extraction method, device, equipment and storage medium
CN116383498A (en) Data matching method and device, electronic equipment and storage medium
CN116991825A (en) Database flashback method, device, equipment and storage medium
CN115983222A (en) EasyExcel-based file data reading method, device, equipment and medium
CN116661902A (en) Node creation method, device, equipment and storage medium
CN117172170A (en) Method, device, equipment and medium for repairing hold time violations
CN114595339A (en) Method and device for detecting triple relation change, electronic equipment and medium
CN117609625A (en) Data processing method, device, electronic equipment and storage medium
CN115794830A (en) Data value determination method and device, electronic equipment and storage medium
CN117439836A (en) Communication rate determining method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination