WO2022121337A1 - Data exploration method and apparatus, and electronic device and storage medium - Google Patents

Data exploration method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2022121337A1
WO2022121337A1 PCT/CN2021/109589 CN2021109589W WO2022121337A1 WO 2022121337 A1 WO2022121337 A1 WO 2022121337A1 CN 2021109589 W CN2021109589 W CN 2021109589W WO 2022121337 A1 WO2022121337 A1 WO 2022121337A1
Authority
WO
WIPO (PCT)
Prior art keywords
probed
probe
field
data table
fields
Prior art date
Application number
PCT/CN2021/109589
Other languages
French (fr)
Chinese (zh)
Inventor
霍康
万月亮
火一莽
Original Assignee
北京锐安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京锐安科技有限公司 filed Critical 北京锐安科技有限公司
Publication of WO2022121337A1 publication Critical patent/WO2022121337A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Definitions

  • the embodiments of the present application relate to the technical field of data analysis, for example, to a data exploration method, apparatus, electronic device, and storage medium.
  • a plurality of independent detection scripts are generally written manually to probe the raw data table by table to analyze the quality of the raw data; and a plurality of independent detection scripts are directly used to execute corresponding task-based queries.
  • every time data detection is performed repeated detection scripts need to be manually written, which will consume a lot of manpower and reduce data detection efficiency.
  • Embodiments of the present application provide a data exploration method, device, electronic device, and storage medium, so as to automatically complete the exploration and analysis of data quality in data tables of different database types, which is convenient to operate and improves the efficiency of data exploration.
  • an embodiment of the present application provides a data detection method, which is applied to a data table detection device, and the method includes:
  • the fields to be detected are respectively detected, and the detection results are determined, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection rules. at least one of.
  • an embodiment of the present application further provides a data table detection device, including:
  • a target data table determination module configured to match at least one target data table from at least one database in a connected state according to the probe scope condition
  • a to-be-explored field determination module configured to acquire a data structure of the at least one target data table, and to determine a to-be-explored field in the at least one target data table according to the data structure of the at least one target data table;
  • the probe result determination module is configured to probe the fields to be probed respectively based on the preset probe rules, and determine the probe results, wherein the probe rules include field fill probe rules, feature value probe rules, and field length probe rules and at least one of the field dictionary code detection rules.
  • an embodiment of the present application further provides an electronic device, the electronic device comprising:
  • processors one or more processors
  • the one or more processors are configured to execute the one or more programs to implement the data detection method provided by any embodiment of the present application.
  • an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the data provided by any embodiment of the present application is implemented Probing method.
  • FIG. 1 is a schematic flowchart of a data exploration method provided in Embodiment 1 of the present application;
  • FIG. 2 is a schematic structural diagram of a data table detection device provided in Embodiment 2 of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.
  • FIG. 1 is a flowchart of a data exploration method provided in Embodiment 1 of the present application, and this embodiment may be applied to a situation in which performance testing is performed in software testing.
  • the method may be performed by a data table look-up device, which may be implemented by means of software and/or hardware.
  • the data exploration method provided in the first embodiment of the present application can cope with the above-mentioned low data detection efficiency. Integrate the profiling results, and generate a profiling report by combining the profiling results and the data table to be profiled, which is convenient for data analysts to analyze the data quality of the profiling data.
  • the configuration file of the data table detection device is pre-configured, so that when the data detection method executes the data table detection device, the configuration parameters in the configuration file can be directly read, and the data can be successfully executed. Probing method.
  • the configuration file includes configuration parameters such as the data type of the source data, the database instance name, the database connection method, the database instance, the database user name, and the password.
  • the configuration file is read and executed, and at least one database is connected according to the connection mode of at least one database in the configuration file.
  • the database includes Oracle, mysql, mpp, hive, txt, excel, csv, word and other databases.
  • ADO Active Data Objects, Active Data Objects
  • the method includes the following steps.
  • the probe scope condition may be a probe condition for obtaining at least one target database set according to requirements. For example, when querying the all_tab_comments table to obtain a list of library tables, a table name filter condition of table_name may be added.
  • the target data table may be a database to be probed determined from a plurality of databases in a connected state according to a probe scope condition of the database. The number of at least one target data table may be one or more.
  • the probe scope condition and the matching mode may be acquired, a matching instruction may be generated based on the probe scope condition and the matching mode, and the matching instruction may be executed to determine at least one target data table in at least one database,
  • the matching mode includes any one of exact matching parameters, fuzzy matching parameters, exact exclusion parameters and fuzzy exclusion parameters.
  • the exact match parameter is extract_match
  • the fuzzy match parameter is fuzzy_match
  • the exact exclusion parameter is exact_not_match
  • the fuzzy exclusion parameter is fuzzy_not_match.
  • the matching instruction may be a database matching instruction generated based on the matching mode and the conditions of the detection range, and the matching instruction is used to determine at least one target data table to be detected.
  • At least one target data table to be probed can be determined based on the matching mode and the database matching instruction generated based on the probe range condition.
  • S120 Acquire the data structure of the at least one target data table, and determine the fields to be probed in the at least one target data table according to the data structure of the at least one target data table.
  • the data structure may be a way of storing the data to be probed in the data table.
  • the fields to be probed may be fields in the data structure of the target data table, such as fields such as field name, field description, field type, and field length. For example, multiple fields in the data structure can be obtained, preset fields to be explored can be obtained as fields to be probed, all fields can be used as fields to be probed, and fields to be probed can also be set according to actual conditions.
  • the number of the fields to be probed is determined, and when the number of the fields to be probed is greater than a preset number, the data of the fields to be probed is sampled, and the sampled data is determined as the to-be-explored field.
  • the probed data corresponding to the probed field. For example, when the number of fields to be probed in the data table exceeds the preset number of data, and there is a large amount of duplicate data in the data of the fields to be probed, random sampling is performed on the data in the fields to be probed, and the sampled fields are Conduct data exploration.
  • the random sampling method can ensure the validity of the exploration results, and reducing the amount of data can also reduce the computational complexity of the data exploration and improve the efficiency of the exploration.
  • the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection at least one of the rules.
  • the detection rule is a detection indicator for the detection and analysis of the field to be explored, and the detection rule includes at least one of a field filling detection rule, a feature value detection rule, a field length detection rule, and a field dictionary code detection rule.
  • the detection rule includes at least one of a field filling detection rule, a feature value detection rule, a field length detection rule, and a field dictionary code detection rule.
  • set profiling rules in advance.
  • the field to be probed is probed and analyzed according to the probe rules, and the probe result of the field to be probed is determined.
  • the detection result includes at least one of the detection filling rate, the feature value coincidence rate, the maximum field length, and the dictionary code corresponding to the field to be detected.
  • check whether the data corresponding to the fields to be checked is filled determine the number of filled fields, and determine the percentage of the filled fields in the fields to be checked to the number of fields to be checked, and determine the Probe fill rate.
  • the number of padding fields may be the number of fields with padding values in the fields to be probed.
  • the determined probe fill rate is presented in a probe report.
  • the formula to calculate the probe fill rate could be:
  • R1 represents the probe fill rate
  • N1 represents the number of padding fields in the fields to be probed
  • M1 represents the number of fields to be probed.
  • the validity of at least one feature value corresponding to the field to be probed is probed, the percentage of the number of valid feature values in the field to be probed to the number of fields to be probed is determined, and the Eigenvalue coincidence rate.
  • the feature value means content capable of identifying real-world entity information, and each feature value has a unique corresponding feature value type.
  • the eigenvalue conformity rate is to detect the conformity of the normative content of the analysis data, and when the eigenvalue conforms to the norm, it is an effective eigenvalue. For example, before calculating the feature value coincidence rate, the feature type corresponding to the feature value in the current field to be probed may be identified to determine the feature value in the current field to be probed.
  • the feature type corresponding to the feature value in the field to be probed is hotel address according to the field description (for example, hotel location, hotel address, hotel details) in the field to be probed, and the current to be probed is determined by the hotel address.
  • the only feature value in the field is Hotel.
  • the feature type corresponding to the feature value in the field to be probed is the license plate number, and the unique feature value in the field to be probed is determined by the license plate number. for the license plate.
  • a feature value of the field to be probed is acquired, and a feature value check is performed on the feature value.
  • the feature type that can perform feature value verification is preset as a preset feature type, and when it is determined that the feature type of the current feature value belongs to the preset feature type, the feature value verification is performed on the feature value belonging to the preset feature type.
  • the preset eigenvalue type is an eigenvalue type conforming to a regular expression.
  • the feature value in the to-be-explored field corresponding to the feature type is determined according to a predefined verification method (for example, the check_carnum verification method) for the feature. value to check.
  • the coincidence rate of the feature value of the current field to be probed is calculated according to the valid feature value of the successful verification and all the feature values participating in the verification. In one embodiment, when the obtained eigenvalue coincidence rate does not meet the preset threshold, the eigenvalue coincidence rate is displayed in the generated exploration report. Among them, the calculation formula of the eigenvalue coincidence rate can be:
  • R2 represents the eigenvalue coincidence rate
  • N2 represents the field number of valid eigenvalues in the to-be-explored field
  • M2 represents the number of all eigenvalues in the to-be-explored field.
  • the content length corresponding to the field to be probed is probed, and the maximum value of the field length of the field to be probed is determined according to the content length.
  • the maximum value of the field length includes the longest value of the field and the shortest value of the field. For example, determine the content length of data corresponding to multiple fields to be probed, and compare the content lengths of multiple fields to determine the longest value or shortest value of multiple fields to be probed, and then generate a probe structure report displayed in.
  • the description information of the field to be searched is searched, and the dictionary code corresponding to the field to be searched is determined according to the description information.
  • the dictionary code may be a gender code, a certificate type code, or the like.
  • a preset identification method is used to identify the dictionary code in the description information in the field to be probed, and the identification result is displayed in the probe report.
  • the preset identification code may be a neural network identification model, or the identification result may be determined according to the input identification information.
  • the dictionary code can be displayed in the probe report in an enumerated manner.
  • the profiling results are integrated, and based on the integrated profiling results and the target Data tables generate profiling reports.
  • the exploration report can be displayed in the form of an Excel table.
  • a probe report includes a probe catalog summary and a probe detail.
  • the general table of exploration catalogue includes the target data table, the data quantity of each target data table, the number of fields, the quantity of feature types, and the feature type information.
  • the probe directory summary table is used to represent the overall statistical information of the current data results to be probed, which is convenient for data analysts to understand the data table to be probed and the basic information of each field in the data table in the current data probe and analysis process.
  • the probe detailed list includes: probe analysis results of the fields to be probed and sample data of the probe analysis results.
  • the probe list is used to represent the detailed information of multiple probe results in the probe analysis results of the current field to be probed, which is convenient for opening and convenient for data analysts to analyze and optimize multiple fields to be probed through the detailed information of the probe results. data performance.
  • At least one target data table is matched from at least one database in a connected state according to the probe scope condition; the data structure of the at least one target data table is acquired, and the target data table is determined according to the data structure of the at least one target data table Describe the fields to be probed in at least one target data table; based on the preset probe rules, probe the fields to be probed respectively, determine the probe results, and fill in the probe results, eigenvalue probe results, fields according to the fields in the probe results The length detection result and the field dictionary code detection result determine the data quality of the field to be detected.
  • the following is an example of the data table detection device provided by the embodiment of the present application, which belongs to the same inventive concept as the data detection method of the above-mentioned embodiment.
  • the data table detection device please refer to the above data detection method. Examples of methods.
  • FIG. 2 is a schematic structural diagram of a data table detection apparatus provided in Embodiment 2 of the present application, and this embodiment can be applied to a situation in which performance testing is performed in software testing.
  • the data table inquiry device includes: a target data table determination module 210 , a field to be inspected field determination module 220 , and an inquiry result determination module 230 . in:
  • the target data table determination module 210 is configured to match at least one target data table from at least one database in the connected state according to the condition of the probe scope.
  • the to-be-explored field determination module 220 is configured to acquire the data structure of the at least one target data table, and determine the to-be-explored field in the at least one target data table according to the data structure of the at least one target data table.
  • the probe result determination module 230 is configured to probe the fields to be probed respectively based on the preset probe rules, and determine the probe results, wherein the probe rules include field fill probe rules, feature value probe rules, and field length probes At least one of a rule and a field dictionary code detection rule.
  • At least one target data table is matched from at least one database in a connected state according to the probe scope condition; the data structure of the at least one target data table is acquired, and the target data table is determined according to the data structure of the at least one target data table Describe the fields to be probed in at least one target data table; based on the preset probe rules, probe the fields to be probed respectively, determine the probe results, and fill in the probe results, eigenvalue probe results, fields according to the fields in the probe results The length detection result and the field dictionary code detection result determine the data quality of the field to be detected.
  • the data table detection device further includes:
  • the database connection unit is configured to read the configuration file, and connect the at least one database according to the connection mode of the at least one database in the configuration file.
  • the target data table determination module 210 includes:
  • a target data table determination unit configured to acquire the probe range condition and the matching mode, generate a matching instruction based on the probe range condition and the matching mode, and execute the matching instruction to determine at least one target data table in at least one database , wherein the matching mode includes any one of exact matching parameters, fuzzy matching parameters, exact exclusion parameters and fuzzy exclusion parameters.
  • the data table detection device further includes:
  • a probe data determination unit configured to determine the number of the fields to be probed, in response to the number of the fields to be probed being greater than a preset number, sampling the data of the fields to be probed, and determining the data obtained by sampling as the The data to be probed corresponding to the field to be probed.
  • the detection result determination module 230 includes:
  • a first probing result determining unit configured to probe whether the probed data corresponding to the fields to be probed is filled, determine the number of filled fields, and determine that the number of filled fields in the fields to be probed accounts for the number of fields to be probed The percentage of determining the probe fill rate
  • a second detection result determination unit configured to detect the validity of at least one feature value corresponding to the field to be detected, and to determine the percentage of the number of valid feature values in the field to be detected accounting for the number of fields to be detected, determining the eigenvalue coincidence rate;
  • a third detection result determination unit configured to detect the content length corresponding to the field to be probed, and to determine the maximum value of the field length of the field to be probed according to the content length;
  • the fourth detection result determination unit is configured to detect the description information of the field to be searched, and determine the dictionary code corresponding to the field to be searched according to the description information.
  • the second detection result determination unit includes:
  • a feature value checking unit configured to, in response to determining that the field to be checked belongs to a preset feature type, obtain a feature value of the field to be checked, and perform feature value check on the feature value.
  • the data table detection device is further configured to integrate the detection results, and generate a detection report based on the integrated detection results and the target data table.
  • the data table detection apparatus provided by the embodiment of the present application can execute the data detection method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
  • the multiple units and modules included in the above-mentioned embodiments of the data table detection apparatus are only divided according to functional logic, but are not limited to the above-mentioned division manner; in addition, the names of all functional units are only for the convenience of distinguishing from each other.
  • FIG. 3 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.
  • FIG. 3 shows a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present application.
  • the electronic device 12 shown in FIG. 3 is only one example.
  • the electronic device 12 takes the form of a general-purpose computing electronic device.
  • the components of the electronic device 12 may include: one or more processors or processing units 16, a system memory 28, and a bus 18 connecting the various system components including the system memory 28 and the processing unit 16.
  • System memory 28 may be memory 28 .
  • the bus 18 represents at least one type of bus structure, eg, the bus 18 includes a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) ) local bus and peripheral component interconnect (peripheral component interconnect, PCI) bus.
  • Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including both volatile and non-volatile media, removable and non-removable media.
  • Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache 32 .
  • Electronic device 12 may include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be configured to read and write to non-removable, non-volatile magnetic media, not shown in FIG. 3, commonly referred to as hard disk drives.
  • a magnetic disk drive for reading and writing to removable non-volatile magnetic disks, such as floppy disks, and an optical disk drive for reading and writing to removable non-volatile optical disks, such as removable non-volatile optical disks may be provided For example CD-ROM, DVD-ROM or other optical media.
  • each drive may be connected to bus 18 through one or more data media interfaces.
  • the memory 28 may include at least one program product having a set of, eg, at least one program module configured to perform the functions of the embodiments of the present application.
  • a program/utility 40 having, for example, a set of at least one program module 42, which may be stored, for example, in memory 28, such program module 42 including an operating system, one or more application programs, other program modules, and program data, in these examples Each or some combination of may include an implementation of a network environment.
  • Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
  • the electronic device 12 may also communicate with one or more external devices 14 (eg, a keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the electronic device 12, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 . Also, the electronic device 12 may communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 20. As shown in FIG. 3 , the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18 . Although not shown in FIG.
  • I/O input/output
  • the electronic device 12 may communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 20. As
  • the processing unit 16 executes a variety of functional applications and sample data acquisition operations by running the program stored in the memory 28, for example, to implement the steps of a data detection method provided by the embodiment of the present application, and the data detection method includes:
  • the fields to be detected are respectively detected, and the detection results are determined, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection rules. at least one of.
  • the fourth embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the data detection method provided by the foregoing embodiments of the present application is implemented.
  • data exploration methods include:
  • the fields to be detected are respectively detected, and the detection results are determined, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection rules. at least one of.
  • the computer storage medium provided by the embodiments of the present application may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • Examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM) ), Erasable Programmable Read-Only Memory (EPROM), memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable of the above The combination.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by, or in combination with, an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied in the computer-readable signal medium. Such propagated data signals may take a variety of forms, including electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the program code embodied on the computer-readable medium may be transmitted by any suitable medium, including: wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the above.
  • any suitable medium including: wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the above.
  • Computer program code for carrying out the operations of this application may be written in one or more programming languages, including object-oriented programming languages, such as Java, Smalltalk, C++, or a combination of programming languages. , but also conventional procedural programming languages - such as C or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer, such as through the Internet using an Internet service provider Connect to an external computer.
  • LAN local area network
  • WAN wide area network
  • Internet service provider Connect to an external computer.
  • multiple modules or multiple steps of the present application can be implemented by a general-purpose computing device, and they can be centralized on a single computing device or distributed on a network composed of multiple computing devices .
  • they can be implemented with program codes executable by a computer device, they can be stored in a storage device and executed by the computing device, or they can be separately fabricated into a plurality of integrated circuit modules, or some of them can be combined.
  • Multiple modules or steps are implemented as a single integrated circuit module. In this way, the embodiments of the present application exist in various forms of combinations of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a data exploration method and apparatus, and an electronic device and a storage medium. The method comprises: according to a condition for an exploration range, matching at least one target data table from a database in a connected state; acquiring a data structure of the at least one target data table, and determining fields, to be explored, in the at least one target data table; and on the basis of a preset exploration rule, respectively exploring said fields, and determining an exploration result.

Description

数据探查方法、装置、电子设备以及存储介质Data exploration method, device, electronic device, and storage medium
本公开要求在2020年12月11日提交中国专利局、申请号为202011462110.1的中国专利申请的优先权,以上申请的全部内容通过引用结合在本公开中。The present disclosure claims the priority of a Chinese patent application with application number 202011462110.1 filed with the Chinese Patent Office on December 11, 2020, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请实施例涉及数据分析技术领域,例如涉及一种数据探查方法、装置、电子设备以及存储介质。The embodiments of the present application relate to the technical field of data analysis, for example, to a data exploration method, apparatus, electronic device, and storage medium.
背景技术Background technique
随着大数据时代的到来,数据的应用也日趋重要,越来越多的应用和服务都基于数据而建立。而且,数据质量是数据分析和数据挖掘结论的有效性和准确性的基础,也是数据驱动决策的前提。With the advent of the era of big data, the application of data is becoming more and more important, and more and more applications and services are built based on data. Moreover, data quality is the basis for the validity and accuracy of data analysis and data mining conclusions, as well as the premise for data-driven decision-making.
相关技术中,一般是通过手动编写多个独立的检测脚本去逐表探查原始数据,分析原始数据的质量;并直接使用多个独立的检测脚本执行对应的任务式查询。但是,若每次进行数据检测时,都需要通过手动编写重复的检测脚本,会耗费大量人力,降低了数据检测效率。In the related art, a plurality of independent detection scripts are generally written manually to probe the raw data table by table to analyze the quality of the raw data; and a plurality of independent detection scripts are directly used to execute corresponding task-based queries. However, every time data detection is performed, repeated detection scripts need to be manually written, which will consume a lot of manpower and reduce data detection efficiency.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种数据探查方法、装置、电子设备以及存储介质,以实现自动地完成不同数据库类型的数据表中数据质量的探查分析,操作方便,提高了数据探查的效率。Embodiments of the present application provide a data exploration method, device, electronic device, and storage medium, so as to automatically complete the exploration and analysis of data quality in data tables of different database types, which is convenient to operate and improves the efficiency of data exploration.
第一方面,本申请实施例提供了一种数据探查方法,应用于数据表探查装置,所述方法包括:In a first aspect, an embodiment of the present application provides a data detection method, which is applied to a data table detection device, and the method includes:
根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表;matching at least one target data table from at least one database in the connected state according to the probe scope condition;
获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;acquiring the data structure of the at least one target data table, and determining the fields to be probed in the at least one target data table according to the data structure of the at least one target data table;
基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。Based on the preset detection rules, the fields to be detected are respectively detected, and the detection results are determined, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection rules. at least one of.
第二方面,本申请实施例还提供了一种数据表探查装置,包括:In a second aspect, an embodiment of the present application further provides a data table detection device, including:
目标数据表确定模块,设置为根据探查范围条件从处于连接状态的至少一 个数据库中匹配至少一个目标数据表;a target data table determination module, configured to match at least one target data table from at least one database in a connected state according to the probe scope condition;
待探查字段确定模块,设置为获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;A to-be-explored field determination module, configured to acquire a data structure of the at least one target data table, and to determine a to-be-explored field in the at least one target data table according to the data structure of the at least one target data table;
探查结果确定模块,设置为基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。The probe result determination module is configured to probe the fields to be probed respectively based on the preset probe rules, and determine the probe results, wherein the probe rules include field fill probe rules, feature value probe rules, and field length probe rules and at least one of the field dictionary code detection rules.
第三方面,本申请实施例还提供了一种电子设备,所述电子设备包括:In a third aspect, an embodiment of the present application further provides an electronic device, the electronic device comprising:
一个或多个处理器;one or more processors;
存储装置,设置为存储一个或多个程序,storage means arranged to store one or more programs,
所述一个或多个处理器,设置为执行所述一个或多个程序,以实现如本申请任意实施例提供的数据探查方法。The one or more processors are configured to execute the one or more programs to implement the data detection method provided by any embodiment of the present application.
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现本申请任意实施例提供的数据探查方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the data provided by any embodiment of the present application is implemented Probing method.
附图说明Description of drawings
为了说明本申请实施例,下面对描述实施例中所需要用到的附图做一简单介绍。In order to illustrate the embodiments of the present application, the following briefly introduces the accompanying drawings used in describing the embodiments.
图1是本申请实施例一提供的数据探查方法的流程示意图;FIG. 1 is a schematic flowchart of a data exploration method provided in Embodiment 1 of the present application;
图2是本申请实施例二提供的数据表探查装置的结构示意图;2 is a schematic structural diagram of a data table detection device provided in Embodiment 2 of the present application;
图3为本申请实施例三提供的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请进行说明。The present application will be described below with reference to the accompanying drawings and embodiments.
实施例一Example 1
图1为本申请实施例一提供的数据探查方法的流程图,本实施例可适用于在软件测试中进行性能测试的情况。该方法可以由数据表探查装置来执行,该装置可以由软件和/或硬件的方式来实现。FIG. 1 is a flowchart of a data exploration method provided in Embodiment 1 of the present application, and this embodiment may be applied to a situation in which performance testing is performed in software testing. The method may be performed by a data table look-up device, which may be implemented by means of software and/or hardware.
在介绍本申请实施例一之前,先对实施例一所处的应用场景进行介绍。Before introducing the first embodiment of the present application, the application scenario in which the first embodiment is located is introduced.
在数据分析场景下,在对数据进行分析之前,可先确保数据质量的有效性 和准确性。为了保证数据质量以及数据的可用性,相关技术中通常是通过技术人员编写检测脚本,通过检测脚本去探查当前需要探查的数据库的原始数据质量,并且检测脚本不能在多个不同类别的数据库之间通用,耗费了大量的人力物力,使数据检测效率低下。In the data analysis scenario, before analyzing the data, the validity and accuracy of the data quality can be ensured. In order to ensure data quality and data availability, in the related art, technicians usually write detection scripts, and use the detection scripts to detect the original data quality of the database that needs to be probed, and the detection scripts cannot be used in multiple different types of databases. , which consumes a lot of manpower and material resources, making data detection inefficient.
本申请实施例一提供的数据探查方法可应对上述数据检测效率低下的工况,本申请实施例一设置数据表探查装置完成多类型数据库中数据的探查分析及统计操作,并将探查结束之后,整合探查结果,结合探查结果和待探查的数据表生成探查报告,方便数据分析人员分析探查的数据的数据质量。为了保证数据表探查装置的通用性和灵活性,预先配置该数据表探查装置的配置文件,以使数据探查方法执行该数据表探查装置时可以直接读取配置文件中的配置参数,顺利执行数据探查方法。其中,配置文件包括源数据的数据类型、数据库实例名、数据库连接方式、数据库实例、数据库用户名和密码等配置参数。The data exploration method provided in the first embodiment of the present application can cope with the above-mentioned low data detection efficiency. Integrate the profiling results, and generate a profiling report by combining the profiling results and the data table to be profiled, which is convenient for data analysts to analyze the data quality of the profiling data. In order to ensure the versatility and flexibility of the data table detection device, the configuration file of the data table detection device is pre-configured, so that when the data detection method executes the data table detection device, the configuration parameters in the configuration file can be directly read, and the data can be successfully executed. Probing method. The configuration file includes configuration parameters such as the data type of the source data, the database instance name, the database connection method, the database instance, the database user name, and the password.
在一实施例中,读取并运行配置文件,根据所述配置文件中至少一个数据库的连接方式对至少一个数据库进行连接。其中,数据库包括Oracle、mysql、mpp、hive、txt、excel、csv、word等数据库。例如,可以采用ADO(Active Data Objects,活动数据对象)方式进行数据库连接,连接方式存在多种。In one embodiment, the configuration file is read and executed, and at least one database is connected according to the connection mode of at least one database in the configuration file. Among them, the database includes Oracle, mysql, mpp, hive, txt, excel, csv, word and other databases. For example, ADO (Active Data Objects, Active Data Objects) can be used for database connection, and there are many connection methods.
在本申请实施例一中,如图1所示,该方法包括以下步骤。In Embodiment 1 of the present application, as shown in FIG. 1 , the method includes the following steps.
S110、根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表。S110. Match at least one target data table from at least one database in a connected state according to the condition of the probe scope.
在本申请实施例中,探查范围条件可以是根据需求设定的获取至少一个目标数据库的探查条件,例如,在查询all_tab_comments表以获取库表清单时,可增加table_name的表名过滤条件。目标数据表可以是根据数据库的探查范围条件从处于连接状态的多个数据库中确定的待探查的数据库。至少一个目标数据表的数量可以是一个也可以是多个。In this embodiment of the present application, the probe scope condition may be a probe condition for obtaining at least one target database set according to requirements. For example, when querying the all_tab_comments table to obtain a list of library tables, a table name filter condition of table_name may be added. The target data table may be a database to be probed determined from a plurality of databases in a connected state according to a probe scope condition of the database. The number of at least one target data table may be one or more.
在一实施例中,可获取所述探查范围条件和匹配方式,基于所述探查范围条件和所述匹配方式生成匹配指令,执行所述匹配指令以在至少一个数据库中确定至少一个目标数据表,其中,所述匹配方式包括精确匹配参数、模糊匹配参数、精确排除参数和模糊排除参数中的任一项。其中,精确匹配参数为extract_match、模糊匹配参数为fuzzy_match、精确排除参数为exact_not_match、模糊排除参数为fuzzy_not_match。匹配指令可为基于匹配方式和探查范围条件生成的数据库匹配指令,匹配指令用于确定待探查的至少一个目标数据表。In one embodiment, the probe scope condition and the matching mode may be acquired, a matching instruction may be generated based on the probe scope condition and the matching mode, and the matching instruction may be executed to determine at least one target data table in at least one database, Wherein, the matching mode includes any one of exact matching parameters, fuzzy matching parameters, exact exclusion parameters and fuzzy exclusion parameters. The exact match parameter is extract_match, the fuzzy match parameter is fuzzy_match, the exact exclusion parameter is exact_not_match, and the fuzzy exclusion parameter is fuzzy_not_match. The matching instruction may be a database matching instruction generated based on the matching mode and the conditions of the detection range, and the matching instruction is used to determine at least one target data table to be detected.
例如,当不需要对全部的数据库中的数据表进行探查分析时,可基于匹配 方式和探查范围条件生成的数据库匹配指令确定待探查的至少一个目标数据表。示例性的,当期望对Oracle库中T_开头和G_开头或名为PERSON_INFO的表进行探查,但是不包括表名中标识了_TMP的表时,可以使用以下配置实现:extract_match=PERSON_INFO;fuzzy_match=T_,G_;exact_not_match=_TMP。For example, when it is not necessary to perform probe analysis on all the data tables in the database, at least one target data table to be probed can be determined based on the matching mode and the database matching instruction generated based on the probe range condition. Exemplarily, when it is desired to probe for tables starting with T_ and G_ or named PERSON_INFO in the Oracle library, but not including tables with _TMP identified in the table name, the following configuration can be used: extract_match=PERSON_INFO; fuzzy_match=T_,G_; exact_not_match=_TMP.
S120、获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段。S120. Acquire the data structure of the at least one target data table, and determine the fields to be probed in the at least one target data table according to the data structure of the at least one target data table.
在本实施例中,数据结构可以是存储数据表中待探查数据的方式。待探查字段可以是目标数据表的数据结构中的字段,例如字段名、字段描述、字段类型、字段长度等字段。例如,可以获取数据结构中的多个字段,可以获取预设探查字段作为待探查字段,也可以将所有的字段作为待探查字段,还可以根据实际情况设置待探查字段。In this embodiment, the data structure may be a way of storing the data to be probed in the data table. The fields to be probed may be fields in the data structure of the target data table, such as fields such as field name, field description, field type, and field length. For example, multiple fields in the data structure can be obtained, preset fields to be explored can be obtained as fields to be probed, all fields can be used as fields to be probed, and fields to be probed can also be set according to actual conditions.
在一实施例中,确定所述待探查字段的数量,当所述待探查字段的数量大于预设数量时,对所述待探查字段的数据进行抽样,将抽样得到的数据确定为所述待探查字段对应的进行探查的数据。例如,当数据表中的待探查字段的数量超过预设数据数量,且待探查字段的数据中存在大量的重复数据时,对该待探查字段中的数据进行随机抽样,并对抽样后的字段进行数据探查。采用随机抽样的方式可以保证探查结果的有效性,并且,减少数据数量也可以使数据探查的计算量减少,提高探查效率。In one embodiment, the number of the fields to be probed is determined, and when the number of the fields to be probed is greater than a preset number, the data of the fields to be probed is sampled, and the sampled data is determined as the to-be-explored field. The probed data corresponding to the probed field. For example, when the number of fields to be probed in the data table exceeds the preset number of data, and there is a large amount of duplicate data in the data of the fields to be probed, random sampling is performed on the data in the fields to be probed, and the sampled fields are Conduct data exploration. The random sampling method can ensure the validity of the exploration results, and reducing the amount of data can also reduce the computational complexity of the data exploration and improve the efficiency of the exploration.
S130、基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。S130. Based on the preset detection rules, respectively perform detection on the to-be-detected fields, and determine the detection results, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection at least one of the rules.
其中,探查规则是对待探查字段进行探查分析的探查指标,探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。例如,在进行数据探查之前,预先设置探查规则。根据探查规则对待探查的字段进行探查分析,并确定待探查字段的探查结果。其中探查结果包括探查填充率、特征值符合率、字段长度最值和待探查字段对应的字典码中的至少一项。The detection rule is a detection indicator for the detection and analysis of the field to be explored, and the detection rule includes at least one of a field filling detection rule, a feature value detection rule, a field length detection rule, and a field dictionary code detection rule. For example, before performing data profiling, set profiling rules in advance. The field to be probed is probed and analyzed according to the probe rules, and the probe result of the field to be probed is determined. The detection result includes at least one of the detection filling rate, the feature value coincidence rate, the maximum field length, and the dictionary code corresponding to the field to be detected.
在一实施例中,探查所述待探查字段对应的数据是否被填充,确定填充字段数量,并确定所述待探查字段中的填充字段数量占所述待探查字段的数量的百分比,确定所述探查填充率。In one embodiment, check whether the data corresponding to the fields to be checked is filled, determine the number of filled fields, and determine the percentage of the filled fields in the fields to be checked to the number of fields to be checked, and determine the Probe fill rate.
其中,填充字段数量可以是待探查字段中有填充值的字段的数量。在一实施例中,将确定的探查填充率在探查报告中进行展示。例如,计算探查填充率 的公式可以是:The number of padding fields may be the number of fields with padding values in the fields to be probed. In one embodiment, the determined probe fill rate is presented in a probe report. For example, the formula to calculate the probe fill rate could be:
Figure PCTCN2021109589-appb-000001
Figure PCTCN2021109589-appb-000001
R1表示探查填充率,N1表示待探查字段中的填充字段数量,M1表示待探查字段的数量。R1 represents the probe fill rate, N1 represents the number of padding fields in the fields to be probed, and M1 represents the number of fields to be probed.
在一实施例中,探查所述待探查字段对应的至少一个特征值的有效性,确定所述待探查字段中的有效特征值的字段数量占所述待探查字段的数量的百分比,确定所述特征值符合率。In one embodiment, the validity of at least one feature value corresponding to the field to be probed is probed, the percentage of the number of valid feature values in the field to be probed to the number of fields to be probed is determined, and the Eigenvalue coincidence rate.
在本申请实施例中,特征值的含义为能够标识现实世界实体信息的内容,每个特征值都有唯一对应的特征值类型。特征值符合率就是为了探查分析数据内容规范性的符合情况,当特征值符合规范时为有效特征值。例如,在计算特征值符合率之前,可识别当前待探查字段中的特征值对应的特征类型确定当前待探查字段中的特征值。In the embodiment of the present application, the feature value means content capable of identifying real-world entity information, and each feature value has a unique corresponding feature value type. The eigenvalue conformity rate is to detect the conformity of the normative content of the analysis data, and when the eigenvalue conforms to the norm, it is an effective eigenvalue. For example, before calculating the feature value coincidence rate, the feature type corresponding to the feature value in the current field to be probed may be identified to determine the feature value in the current field to be probed.
示例性的,当通过待探查字段中的字段描述(例如旅馆位置、旅馆地址、旅馆详细地)确定当前待探查字段中的特征值对应的特征类型为旅馆地址,并通过旅馆地址确定当前待探查字段中的唯一特征值为旅馆。再例如,当通过待探查字段中的字段描述(车牌号码、车牌数字)确定当前待探查字段中的特征值对应的特征类型为车牌号,并通过车牌号码确定当前待探查字段中的唯一特征值为车牌。Exemplarily, when it is determined that the feature type corresponding to the feature value in the field to be probed is hotel address according to the field description (for example, hotel location, hotel address, hotel details) in the field to be probed, and the current to be probed is determined by the hotel address. The only feature value in the field is Hotel. For another example, when it is determined by the field description (license plate number, license plate number) in the field to be probed that the feature type corresponding to the feature value in the field to be probed is the license plate number, and the unique feature value in the field to be probed is determined by the license plate number. for the license plate.
在一实施例中,响应于确定所述待探查字段属于预设特征类型,获取所述待探查字段的特征值,并对所述特征值进行特征值校验。例如,预先设置可以进行特征值校验的特征类型为预设特征类型,当确定当前特征值的特征类型属于预设特征类型时,对属于预设特征类型的特征值进行特征值校验。其中,预设特征值类型为符合正则表达式的特征值类型。当对特征值进行特征值校验时,可先定义当前特征值对应的校验方法,校验方法可以是预先进行定义,也可以是根据实际情况进行定义。In one embodiment, in response to determining that the field to be probed belongs to a preset feature type, a feature value of the field to be probed is acquired, and a feature value check is performed on the feature value. For example, the feature type that can perform feature value verification is preset as a preset feature type, and when it is determined that the feature type of the current feature value belongs to the preset feature type, the feature value verification is performed on the feature value belonging to the preset feature type. The preset eigenvalue type is an eigenvalue type conforming to a regular expression. When the eigenvalue verification is performed on the eigenvalue, the verification method corresponding to the current eigenvalue can be defined first, and the verification method can be defined in advance or defined according to the actual situation.
示例性的,当确定上述实施例中的车牌号为预设特征类型时,对该特征类型对应的待探查字段中的特征值按照预先定义的校验方法(例如check_carnum校验方法)对该特征值进行校验。Exemplarily, when it is determined that the license plate number in the above embodiment is a preset feature type, the feature value in the to-be-explored field corresponding to the feature type is determined according to a predefined verification method (for example, the check_carnum verification method) for the feature. value to check.
在一实施例中,根据校验成功的有效特征值与所有参与校验的特征值计算当前待探查字段的特征值符合率。在一实施例中,当得到的特征值符合率不满足预设阈值时,将特征值符合率在生成的探查报告中进行显示。其中,特征值 符合率的计算公式可以是:In one embodiment, the coincidence rate of the feature value of the current field to be probed is calculated according to the valid feature value of the successful verification and all the feature values participating in the verification. In one embodiment, when the obtained eigenvalue coincidence rate does not meet the preset threshold, the eigenvalue coincidence rate is displayed in the generated exploration report. Among them, the calculation formula of the eigenvalue coincidence rate can be:
Figure PCTCN2021109589-appb-000002
Figure PCTCN2021109589-appb-000002
R2表示特征值符合率,N2表示待探查字段中的有效特征值的字段数量,M2表示待探查字段中所有特征值的数量。R2 represents the eigenvalue coincidence rate, N2 represents the field number of valid eigenvalues in the to-be-explored field, and M2 represents the number of all eigenvalues in the to-be-explored field.
在一实施例中,探查所述待探查字段对应的内容长度,根据所述内容长度确定所述待探查字段的字段长度最值。In one embodiment, the content length corresponding to the field to be probed is probed, and the maximum value of the field length of the field to be probed is determined according to the content length.
其中,字段长度最值包括字段最长值和字段最短值。例如,确定多个待探查字段对应的数据的内容长度,并对多个字段的内容长度进行对比,确定多个待探查字段的字段最长值或者字段最短值,并在之后生成的探测结构报告中进行展示。The maximum value of the field length includes the longest value of the field and the shortest value of the field. For example, determine the content length of data corresponding to multiple fields to be probed, and compare the content lengths of multiple fields to determine the longest value or shortest value of multiple fields to be probed, and then generate a probe structure report displayed in.
在一实施例中,探查所述待探查字段的描述信息,根据所述描述信息确定所述待探查字段对应的字典码。In one embodiment, the description information of the field to be searched is searched, and the dictionary code corresponding to the field to be searched is determined according to the description information.
其中,字典码可以是性别代码、证件种类代码等。例如,采用预设识别方式识别待探查字段中的描述信息中的字典码,并将识别结果在探查报告中进行展示。其中,预设识别代码可以是神经网络识别模型,也可以是根据输入的识别信息确定识别结果。在一实施例中,可以将字典码以枚举方式在探查报告中进行展示。The dictionary code may be a gender code, a certificate type code, or the like. For example, a preset identification method is used to identify the dictionary code in the description information in the field to be probed, and the identification result is displayed in the probe report. Wherein, the preset identification code may be a neural network identification model, or the identification result may be determined according to the input identification information. In one embodiment, the dictionary code can be displayed in the probe report in an enumerated manner.
在一实施例中,为了方便探查分析人员更方便的获取所有的探查数据结果,在确定待探查字段的探查结果之后,将所述探查结果进行整合,并基于整合后的探查结果和所述目标数据表生成探查报告。In one embodiment, in order to facilitate the profiling analyst to obtain all profiling data results more conveniently, after determining the profiling results of the field to be queried, the profiling results are integrated, and based on the integrated profiling results and the target Data tables generate profiling reports.
其中,探查报告可以是以Excel表格的形式进行展示。例如,探查报告中包括探查目录总表和探查明细表。其中,探查目录总表包括所述目标数据表、每个目标数据表的数据数量、字段数量、特征类型数量以及特征类型信息。探查目录总表用来表示当前待探查的数据结果的整体统计信息,方便数据分析人员了解当前数据探查分析过程中的待探查数据表以及数据表中的每个字段的基本信息。所述探查明细表包括:所述待探查字段的探查分析结果和所述探查分析结果的样例数据。探查明细表用来表示当前待探查字段的探查分析结果中多个探查结果的详细信息,方便开放以及方便数据分析人员通过探查结果的详细信息有针对性的对多个待探查字段进行分析,优化数据性能。Among them, the exploration report can be displayed in the form of an Excel table. For example, a probe report includes a probe catalog summary and a probe detail. Wherein, the general table of exploration catalogue includes the target data table, the data quantity of each target data table, the number of fields, the quantity of feature types, and the feature type information. The probe directory summary table is used to represent the overall statistical information of the current data results to be probed, which is convenient for data analysts to understand the data table to be probed and the basic information of each field in the data table in the current data probe and analysis process. The probe detailed list includes: probe analysis results of the fields to be probed and sample data of the probe analysis results. The probe list is used to represent the detailed information of multiple probe results in the probe analysis results of the current field to be probed, which is convenient for opening and convenient for data analysts to analyze and optimize multiple fields to be probed through the detailed information of the probe results. data performance.
本申请实施例通过根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表;获取所述至少一个目标数据表的数据结构,根据所 述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,根据探查结果中的字段填充探查结果、特征值探查结果、字段长度探查结果和字段字典码探查结果,确定待探查字段的数据质量。本申请实施例通过设置数据表探查装置,将数据表装置与至少一个数据库连接,对数据库中的数据表进行数据结构和数据内容的高效率探查,实现了自动地完成不同数据库类型的数据表中数据质量的探查分析,操作方便,提高了数据探查的效率。In this embodiment of the present application, at least one target data table is matched from at least one database in a connected state according to the probe scope condition; the data structure of the at least one target data table is acquired, and the target data table is determined according to the data structure of the at least one target data table Describe the fields to be probed in at least one target data table; based on the preset probe rules, probe the fields to be probed respectively, determine the probe results, and fill in the probe results, eigenvalue probe results, fields according to the fields in the probe results The length detection result and the field dictionary code detection result determine the data quality of the field to be detected. In the embodiment of the present application, by setting up a data table inquiry device, connecting the data table device with at least one database, and performing efficient inquiry on the data structure and data content of the data table in the database, it realizes the automatic completion of the data table in different database types. The exploration and analysis of data quality is easy to operate and improves the efficiency of data exploration.
以下是本申请实施例提供的数据表探查装置的实施例,该装置与上述实施例的数据探查方法属于同一个发明构思,在数据表探查装置的实施例中的细节内容,可以参考上述数据探查方法的实施例。The following is an example of the data table detection device provided by the embodiment of the present application, which belongs to the same inventive concept as the data detection method of the above-mentioned embodiment. For details in the embodiment of the data table detection device, please refer to the above data detection method. Examples of methods.
实施例二Embodiment 2
图2为本申请实施例二提供的数据表探查装置的结构示意图,本实施例可适用于在软件测试中进行性能测试的情况。该数据表探查装置包括:目标数据表确定模块210、待探查字段确定模块220和探查结果确定模块230。其中:FIG. 2 is a schematic structural diagram of a data table detection apparatus provided in Embodiment 2 of the present application, and this embodiment can be applied to a situation in which performance testing is performed in software testing. The data table inquiry device includes: a target data table determination module 210 , a field to be inspected field determination module 220 , and an inquiry result determination module 230 . in:
目标数据表确定模块210,设置为根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表。The target data table determination module 210 is configured to match at least one target data table from at least one database in the connected state according to the condition of the probe scope.
待探查字段确定模块220,设置为获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段。The to-be-explored field determination module 220 is configured to acquire the data structure of the at least one target data table, and determine the to-be-explored field in the at least one target data table according to the data structure of the at least one target data table.
探查结果确定模块230,设置为基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。The probe result determination module 230 is configured to probe the fields to be probed respectively based on the preset probe rules, and determine the probe results, wherein the probe rules include field fill probe rules, feature value probe rules, and field length probes At least one of a rule and a field dictionary code detection rule.
本申请实施例通过根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表;获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,根据探查结果中的字段填充探查结果、特征值探查结果、字段长度探查结果和字段字典码探查结果,确定待探查字段的数据质量。本申请实施例通过设置数据表探查装置,将该数据表装置与至少一个数据库连接,对数据库中的数据表进行结构和数据内容的高效率探查,实现了自动地完成不同数据库类型的数据表中数据质量的探查分析,操作方便,提高了数据探查的效率。In this embodiment of the present application, at least one target data table is matched from at least one database in a connected state according to the probe scope condition; the data structure of the at least one target data table is acquired, and the target data table is determined according to the data structure of the at least one target data table Describe the fields to be probed in at least one target data table; based on the preset probe rules, probe the fields to be probed respectively, determine the probe results, and fill in the probe results, eigenvalue probe results, fields according to the fields in the probe results The length detection result and the field dictionary code detection result determine the data quality of the field to be detected. In the embodiment of the present application, by setting up a data table inquiry device, connecting the data table device with at least one database, and efficiently inspecting the structure and data content of the data table in the database, the automatic completion of the data table in different database types is realized. The exploration and analysis of data quality is easy to operate and improves the efficiency of data exploration.
在上述实施例的基础上,所述数据表探查装置还包括:On the basis of the above embodiment, the data table detection device further includes:
数据库连接单元,设置为读取配置文件,根据所述配置文件中至少一个数据库的连接方式对所述至少一个数据库进行连接。The database connection unit is configured to read the configuration file, and connect the at least one database according to the connection mode of the at least one database in the configuration file.
在上述实施例的基础上,其中,所述目标数据表确定模块210,包括:On the basis of the above embodiment, the target data table determination module 210 includes:
目标数据表确定单元,设置为获取所述探查范围条件和匹配方式,基于所述探查范围条件和所述匹配方式生成匹配指令,执行所述匹配指令以在至少一个数据库中确定至少一个目标数据表,其中,所述匹配方式包括精确匹配参数、模糊匹配参数、精确排除参数和模糊排除参数中的任一项。A target data table determination unit, configured to acquire the probe range condition and the matching mode, generate a matching instruction based on the probe range condition and the matching mode, and execute the matching instruction to determine at least one target data table in at least one database , wherein the matching mode includes any one of exact matching parameters, fuzzy matching parameters, exact exclusion parameters and fuzzy exclusion parameters.
在上述实施例的基础上,所述数据表探查装置还包括:On the basis of the above embodiment, the data table detection device further includes:
探查数据确定单元,设置为确定所述待探查字段的数量,响应于所述待探查字段的数量大于预设数量,对所述待探查字段的数据进行抽样,将抽样得到的数据确定为所述待探查字段对应的进行探查的数据。A probe data determination unit, configured to determine the number of the fields to be probed, in response to the number of the fields to be probed being greater than a preset number, sampling the data of the fields to be probed, and determining the data obtained by sampling as the The data to be probed corresponding to the field to be probed.
在上述实施例的基础上,其中,所述探查结果确定模块230,包括:On the basis of the above embodiment, wherein, the detection result determination module 230 includes:
第一探查结果确定单元,设置为探查所述待探查字段对应的进行探查的数据是否被填充,确定填充字段数量,并确定所述待探查字段中的填充字段数量占所述待探查字段的数量的百分比,确定所述探查填充率;A first probing result determining unit, configured to probe whether the probed data corresponding to the fields to be probed is filled, determine the number of filled fields, and determine that the number of filled fields in the fields to be probed accounts for the number of fields to be probed The percentage of determining the probe fill rate;
第二探查结果确定单元,设置为探查所述待探查字段对应的至少一个特征值的有效性,确定所述待探查字段中的有效特征值的字段数量占所述待探查字段的数量的百分比,确定所述特征值符合率;a second detection result determination unit, configured to detect the validity of at least one feature value corresponding to the field to be detected, and to determine the percentage of the number of valid feature values in the field to be detected accounting for the number of fields to be detected, determining the eigenvalue coincidence rate;
第三探查结果确定单元,设置为探查所述待探查字段对应的内容长度,根据所述内容长度确定所述待探查字段的字段长度最值;a third detection result determination unit, configured to detect the content length corresponding to the field to be probed, and to determine the maximum value of the field length of the field to be probed according to the content length;
第四探查结果确定单元,设置为探查所述待探查字段的描述信息,根据所述描述信息确定所述待探查字段对应的字典码。The fourth detection result determination unit is configured to detect the description information of the field to be searched, and determine the dictionary code corresponding to the field to be searched according to the description information.
在上述实施例的基础上,其中,所述第二探查结果确定单元,包括:On the basis of the above embodiment, wherein, the second detection result determination unit includes:
特征值校验单元,设置为响应于确定所述待探查字段属于预设特征类型,获取所述待探查字段的特征值,并对所述特征值进行特征值校验。A feature value checking unit, configured to, in response to determining that the field to be checked belongs to a preset feature type, obtain a feature value of the field to be checked, and perform feature value check on the feature value.
在上述实施例的基础上,所述数据表探查装置还设置为,将所述探查结果进行整合,并基于整合后的探查结果和所述目标数据表生成探查报告。On the basis of the above-mentioned embodiment, the data table detection device is further configured to integrate the detection results, and generate a detection report based on the integrated detection results and the target data table.
本申请实施例所提供的数据表探查装置可执行本申请任意实施例所提供的数据探查方法,具备执行方法相应的功能模块。The data table detection apparatus provided by the embodiment of the present application can execute the data detection method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
上述数据表探查装置的实施例中所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分方式;另外,所有功能单元的名称也 只是为了便于相互区分。The multiple units and modules included in the above-mentioned embodiments of the data table detection apparatus are only divided according to functional logic, but are not limited to the above-mentioned division manner; in addition, the names of all functional units are only for the convenience of distinguishing from each other.
实施例三Embodiment 3
图3为本申请实施例三提供的电子设备的结构示意图。图3示出了适于用来实现本申请实施方式的示例性电子设备12的框图。图3显示的电子设备12仅仅是一个示例。FIG. 3 is a schematic structural diagram of an electronic device provided in Embodiment 3 of the present application. FIG. 3 shows a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present application. The electronic device 12 shown in FIG. 3 is only one example.
如图3所示,电子设备12以通用计算电子设备的形式表现。电子设备12的组件可以包括:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 3, the electronic device 12 takes the form of a general-purpose computing electronic device. The components of the electronic device 12 may include: one or more processors or processing units 16, a system memory 28, and a bus 18 connecting the various system components including the system memory 28 and the processing unit 16.
系统存储器28可为内存28。 System memory 28 may be memory 28 .
总线18表示至少一类总线结构,例如,总线18包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(Micro Channel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(peripheral component interconnect,PCI)总线。The bus 18 represents at least one type of bus structure, eg, the bus 18 includes a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) ) local bus and peripheral component interconnect (peripheral component interconnect, PCI) bus.
电子设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。 Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including both volatile and non-volatile media, removable and non-removable media.
内存28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)30和/或高速缓存32。电子设备12可以包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以设置为读写不可移动的、非易失性磁介质,图3未显示不可移动的、非易失性磁介质,通常称为硬盘驱动器。尽管图3中未示出,可以提供用于对可移动非易失性磁盘例如软盘读写的磁盘驱动器,以及对可移动非易失性光盘读写的光盘驱动器,可移动非易失性光盘例如CD-ROM,DVD-ROM或者其它光介质。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。内存28可以包括至少一个程序产品,该程序产品具有一组例如至少一个程序模块,这些程序模块被配置以执行本申请实施例的功能。 Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache 32 . Electronic device 12 may include other removable/non-removable, volatile/non-volatile computer system storage media. For example only, storage system 34 may be configured to read and write to non-removable, non-volatile magnetic media, not shown in FIG. 3, commonly referred to as hard disk drives. Although not shown in FIG. 3, a magnetic disk drive for reading and writing to removable non-volatile magnetic disks, such as floppy disks, and an optical disk drive for reading and writing to removable non-volatile optical disks, such as removable non-volatile optical disks, may be provided For example CD-ROM, DVD-ROM or other optical media. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. The memory 28 may include at least one program product having a set of, eg, at least one program module configured to perform the functions of the embodiments of the present application.
具有一组例如至少一个程序模块42的程序/实用工具40,可以存储在例如 内存28中,这样的程序模块42包括操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。A program/utility 40 having, for example, a set of at least one program module 42, which may be stored, for example, in memory 28, such program module 42 including an operating system, one or more application programs, other program modules, and program data, in these examples Each or some combination of may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
电子设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该电子设备12交互的设备通信,和/或与使得该电子设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,电子设备12还可以通过网络适配器20与一个或者多个网络通信,网络例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,公共网络例如因特网。如图3所示,网络适配器20通过总线18与电子设备12的其它模块通信。尽管图3中未示出,可以结合电子设备12使用其它硬件和/或软件模块,其它硬件和/或软件模块包括:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。The electronic device 12 may also communicate with one or more external devices 14 (eg, a keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the electronic device 12, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 . Also, the electronic device 12 may communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 20. As shown in FIG. 3 , the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18 . Although not shown in FIG. 3, other hardware and/or software modules may be used in conjunction with electronic device 12, including: microcode, device drivers, redundant processing units, external disk drive arrays, disk arrays ( Redundant Arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems, etc.
处理单元16通过运行存储在内存28中的程序,执行多种功能应用以及样本数据的获取操作,例如实现本申请实施例所提供的一种数据探查方法步骤,数据探查方法包括:The processing unit 16 executes a variety of functional applications and sample data acquisition operations by running the program stored in the memory 28, for example, to implement the steps of a data detection method provided by the embodiment of the present application, and the data detection method includes:
根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表;matching at least one target data table from at least one database in the connected state according to the probe scope condition;
获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;acquiring the data structure of the at least one target data table, and determining the fields to be probed in the at least one target data table according to the data structure of the at least one target data table;
基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。Based on the preset detection rules, the fields to be detected are respectively detected, and the detection results are determined, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection rules. at least one of.
实施例四Embodiment 4
本实施例四提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述的本申请实施例提供的数据探查方法。例如,数据探查方法包括:The fourth embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the data detection method provided by the foregoing embodiments of the present application is implemented. For example, data exploration methods include:
根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表;matching at least one target data table from at least one database in the connected state according to the probe scope condition;
获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;acquiring the data structure of the at least one target data table, and determining the fields to be probed in the at least one target data table according to the data structure of the at least one target data table;
基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。Based on the preset detection rules, the fields to be detected are respectively detected, and the detection results are determined, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection rules. at least one of.
本申请实施例提供的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、内存、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用,或者结合使用。The computer storage medium provided by the embodiments of the present application may adopt any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. Examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM) ), Erasable Programmable Read-Only Memory (EPROM), memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable of the above The combination. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by, or in combination with, an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,计算机可读的信号介质中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输由指令执行系统、装置或者器件使用或者结合使用的程序。A computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied in the computer-readable signal medium. Such propagated data signals may take a variety of forms, including electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。The program code embodied on the computer-readable medium may be transmitted by any suitable medium, including: wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the above.
可以以一种或多种程序设计语言或多种程序设计语言的组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如C语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、一部分在用户计算机上另一部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网 (LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机,例如利用因特网服务提供商来通过因特网连接到外部计算机。Computer program code for carrying out the operations of this application may be written in one or more programming languages, including object-oriented programming languages, such as Java, Smalltalk, C++, or a combination of programming languages. , but also conventional procedural programming languages - such as C or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer, such as through the Internet using an Internet service provider Connect to an external computer.
本领域普通技术人员应该明白,上述的本申请的多个模块或多个步骤可以用通用的计算装置来实现,它们可以集中在单个计算装置上,或者分布在多个计算装置所组成的网络上。在一实施例中,它们可以用计算机装置可执行的程序代码来实现,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成多个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请实施例存在多种形式的硬件和软件的结合。Those of ordinary skill in the art should understand that the above-mentioned multiple modules or multiple steps of the present application can be implemented by a general-purpose computing device, and they can be centralized on a single computing device or distributed on a network composed of multiple computing devices . In one embodiment, they can be implemented with program codes executable by a computer device, they can be stored in a storage device and executed by the computing device, or they can be separately fabricated into a plurality of integrated circuit modules, or some of them can be combined. Multiple modules or steps are implemented as a single integrated circuit module. In this way, the embodiments of the present application exist in various forms of combinations of hardware and software.

Claims (16)

  1. 一种数据探查方法,应用于数据表探查装置,包括:A data detection method, applied to a data table detection device, includes:
    根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表;matching at least one target data table from at least one database in the connected state according to the probe scope condition;
    获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;acquiring the data structure of the at least one target data table, and determining the fields to be probed in the at least one target data table according to the data structure of the at least one target data table;
    基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。Based on the preset detection rules, the fields to be detected are respectively detected, and the detection results are determined, wherein the detection rules include field filling detection rules, feature value detection rules, field length detection rules, and field dictionary code detection rules. at least one of.
  2. 根据权利要求1所述的方法,在根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表之前,还包括:The method of claim 1, before matching at least one target data table from at least one database in a connected state according to a probe scope condition, further comprising:
    读取配置文件,根据所述配置文件中至少一个数据库的连接方式对所述至少一个数据库进行连接。The configuration file is read, and the at least one database is connected according to the connection mode of the at least one database in the configuration file.
  3. 根据权利要求1所述的方法,其中,所述根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表,包括:The method of claim 1, wherein the matching at least one target data table from at least one database in a connected state according to a probe scope condition comprises:
    获取所述探查范围条件和匹配方式,基于所述探查范围条件和所述匹配方式生成匹配指令,执行所述匹配指令以在至少一个数据库中确定至少一个目标数据表,其中,所述匹配方式包括精确匹配参数、模糊匹配参数、精确排除参数和模糊排除参数中的任一项。Acquire the probe scope condition and the matching mode, generate a matching instruction based on the probe scope condition and the matching mode, and execute the matching instruction to determine at least one target data table in at least one database, wherein the matching mode includes Any of exact match parameters, fuzzy match parameters, exact exclude parameters, and fuzzy exclude parameters.
  4. 根据权利要求1所述的方法,在根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段之后,所述方法还包括:The method according to claim 1, after determining the fields to be probed in the at least one target data table according to the data structure of the at least one target data table, the method further comprises:
    确定所述待探查字段的数量,响应于所述待探查字段的数量大于预设数量,对所述待探查字段的数据进行抽样,将抽样得到的数据确定为所述待探查字段对应的进行探查的数据。Determining the number of fields to be probed, in response to the number of fields to be probed being greater than a preset number, sampling the data of the fields to be probed, and determining the sampled data as the corresponding fields to be probed to be probed The data.
  5. 根据权利要求4所述的方法,其中,所述基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,包括:The method according to claim 4, wherein, based on a preset detection rule, the fields to be detected are respectively detected, and the detection result is determined, comprising:
    探查所述待探查字段对应的进行探查的数据是否被填充,确定填充字段数量,并确定所述待探查字段中的填充字段数量占所述待探查字段的数量的百分比,确定所述探查填充率;和/或,Investigate whether the probed data corresponding to the field to be probed is filled, determine the number of filled fields, and determine the percentage of the number of filled fields in the field to be probed to the number of fields to be probed, and determine the probe fill rate ;and / or,
    探查所述待探查字段对应的至少一个特征值的有效性,确定所述待探查字段中的有效特征值的字段数量占所述待探查字段的数量的百分比,确定所述特征值符合率;和/或,Investigating the validity of at least one feature value corresponding to the field to be probed, determining the percentage of the number of valid feature values in the field to be probed to the number of fields to be probed, and determining the feature value coincidence rate; and /or,
    探查所述待探查字段对应的内容长度,根据所述内容长度确定所述待探查 字段的字段长度最值;和/或,Investigate the content length corresponding to the field to be probed, and determine the maximum value of the field length of the field to be probed according to the content length; and/or,
    探查所述待探查字段的描述信息,根据所述描述信息确定所述待探查字段对应的字典码。The description information of the field to be searched is searched, and the dictionary code corresponding to the field to be searched is determined according to the description information.
  6. 根据权利要求5所述的方法,其中,所述探查所述待探查字段对应的至少一个特征值的有效性,包括:The method according to claim 5, wherein the detecting the validity of at least one characteristic value corresponding to the field to be detected comprises:
    响应于确定所述待探查字段属于预设特征类型,获取所述待探查字段的特征值,并对所述特征值进行特征值校验。In response to determining that the field to be probed belongs to a preset feature type, a feature value of the field to be probed is acquired, and feature value verification is performed on the feature value.
  7. 根据权利要求1所述的方法,在确定探查结果之后,所述方法还包括:The method of claim 1, after determining the probe result, the method further comprising:
    将所述探查结果进行整合,并基于整合后的探查结果和所述目标数据表生成探查报告。The probe results are integrated, and a probe report is generated based on the integrated probe results and the target data table.
  8. 一种数据表探查装置,包括:A data table probe device comprising:
    目标数据表确定模块,设置为根据探查范围条件从处于连接状态的至少一个数据库中匹配至少一个目标数据表;a target data table determination module, configured to match at least one target data table from at least one database in a connected state according to the probe scope condition;
    待探查字段确定模块,设置为获取所述至少一个目标数据表的数据结构,根据所述至少一个目标数据表的数据结构确定所述至少一个目标数据表中的待探查字段;A to-be-explored field determination module, configured to acquire the data structure of the at least one target data table, and to determine the to-be-explored field in the at least one target data table according to the data structure of the at least one target data table;
    探查结果确定模块,设置为基于预先设置的探查规则,分别对所述待探查字段进行探查,并确定探查结果,其中,所述探查规则包括字段填充探查规则、特征值探查规则、字段长度探查规则和字段字典码探查规则中的至少一项。The probe result determination module is configured to probe the fields to be probed respectively based on the preset probe rules, and determine the probe results, wherein the probe rules include field fill probe rules, feature value probe rules, and field length probe rules and at least one of the field dictionary code detection rules.
  9. 根据权利要求8所述的数据表探查装置,所述数据表探查装置还包括:The data table exploration device of claim 8, further comprising:
    数据库连接单元,设置为读取配置文件,根据所述配置文件中至少一个数据库的连接方式对所述至少一个数据库进行连接。The database connection unit is configured to read the configuration file, and connect the at least one database according to the connection mode of the at least one database in the configuration file.
  10. 根据权利要求8所述的数据表探查装置,其中,所述目标数据表确定模块,包括:The data table detection device according to claim 8, wherein the target data table determination module comprises:
    目标数据表确定单元,设置为获取所述探查范围条件和匹配方式,基于所述探查范围条件和所述匹配方式生成匹配指令,执行所述匹配指令以在至少一个数据库中确定至少一个目标数据表,其中,所述匹配方式包括精确匹配参数、模糊匹配参数、精确排除参数和模糊排除参数中的任一项。A target data table determination unit, configured to acquire the probe range condition and the matching mode, generate a matching instruction based on the probe range condition and the matching mode, and execute the matching instruction to determine at least one target data table in at least one database , wherein the matching mode includes any one of exact matching parameters, fuzzy matching parameters, exact exclusion parameters and fuzzy exclusion parameters.
  11. 根据权利要求8所述的数据表探查装置,所述数据表探查装置还包括:The data table exploration device of claim 8, further comprising:
    探查数据确定单元,设置为确定所述待探查字段的数量,响应于所述待探查字段的数量大于预设数量,对所述待探查字段的数据进行抽样,将抽样得到的数据确定为所述待探查字段对应的进行探查的数据。A probe data determination unit, configured to determine the number of the fields to be probed, in response to the number of the fields to be probed being greater than a preset number, sampling the data of the fields to be probed, and determining the data obtained by sampling as the The data to be probed corresponding to the field to be probed.
  12. 根据权利要求11所述的数据表探查装置,其中,所述探查结果确定模块,包括:The data table probe device of claim 11, wherein the probe result determination module comprises:
    第一探查结果确定单元,设置为探查所述待探查字段对应的进行探查的数据是否被填充,确定填充字段数量,并确定所述待探查字段中的填充字段数量占所述待探查字段的数量的百分比,确定所述探查填充率;A first probing result determining unit, configured to probe whether the probed data corresponding to the fields to be probed is filled, determine the number of filled fields, and determine that the number of filled fields in the fields to be probed accounts for the number of fields to be probed The percentage of determining the probe fill rate;
    第二探查结果确定单元,设置为探查所述待探查字段对应的至少一个特征值的有效性,确定所述待探查字段中的有效特征值的字段数量占所述待探查字段的数量的百分比,确定所述特征值符合率;a second probing result determining unit, configured to probe the validity of at least one feature value corresponding to the field to be probed, and determine the percentage of the number of valid feature values in the field to be probed to the number of fields to be probed, determining the eigenvalue coincidence rate;
    第三探查结果确定单元,设置为探查所述待探查字段对应的内容长度,根据所述内容长度确定所述待探查字段的字段长度最值;a third detection result determination unit, configured to detect the content length corresponding to the field to be probed, and to determine the maximum value of the field length of the field to be probed according to the content length;
    第四探查结果确定单元,设置为探查所述待探查字段的描述信息,根据所述描述信息确定所述待探查字段对应的字典码。The fourth detection result determination unit is configured to detect the description information of the field to be searched, and determine the dictionary code corresponding to the field to be searched according to the description information.
  13. 根据权利要求12所述的数据表探查装置,其中,所述第二探查结果确定单元,包括:The data table inquiry apparatus according to claim 12, wherein the second inquiry result determination unit comprises:
    特征值校验单元,设置为响应于确定所述待探查字段属于预设特征类型,获取所述待探查字段的特征值,并对所述特征值进行特征值校验。A feature value checking unit, configured to, in response to determining that the field to be checked belongs to a preset feature type, obtain a feature value of the field to be checked, and perform feature value checking on the feature value.
  14. 根据权利要求8所述的数据表探查装置,所述数据表探查装置还设置为,将所述探查结果进行整合,并基于整合后的探查结果和所述目标数据表生成探查报告。The data table exploration device according to claim 8, further configured to integrate the inspection results and generate an inspection report based on the integrated inspection results and the target data table.
  15. 一种电子设备,包括:An electronic device comprising:
    一个或多个处理器;one or more processors;
    存储装置,设置为存储一个或多个程序,storage means arranged to store one or more programs,
    所述一个或多个处理器,设置为执行所述一个或多个程序,以实现如权利要求1-7中任一所述的数据探查方法。The one or more processors configured to execute the one or more programs to implement the data exploration method of any one of claims 1-7.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-7中任一所述的数据探查方法。A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the data exploration method according to any one of claims 1-7.
PCT/CN2021/109589 2020-12-11 2021-07-30 Data exploration method and apparatus, and electronic device and storage medium WO2022121337A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011462110.1A CN112559523A (en) 2020-12-11 2020-12-11 Data detection method and device, electronic equipment and storage medium
CN202011462110.1 2020-12-11

Publications (1)

Publication Number Publication Date
WO2022121337A1 true WO2022121337A1 (en) 2022-06-16

Family

ID=75062769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109589 WO2022121337A1 (en) 2020-12-11 2021-07-30 Data exploration method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN112559523A (en)
WO (1) WO2022121337A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116131860A (en) * 2022-12-28 2023-05-16 山东华科信息技术有限公司 Data compression system and data compression method for distributed energy grid-connected monitoring

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559523A (en) * 2020-12-11 2021-03-26 北京锐安科技有限公司 Data detection method and device, electronic equipment and storage medium
CN113722325A (en) * 2021-08-31 2021-11-30 北京锐安科技有限公司 Method and device for detecting table information in database, computer equipment and storage medium
CN113961571B (en) * 2021-12-22 2022-03-22 太极计算机股份有限公司 Multi-mode data sensing method and device based on data probe

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus
CN108389621A (en) * 2018-02-08 2018-08-10 山东康网网络科技有限公司 Medical record database quality determining method and system
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件系统有限公司 Data quality detection method and device
CN112559523A (en) * 2020-12-11 2021-03-26 北京锐安科技有限公司 Data detection method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708909A (en) * 2015-11-18 2017-05-24 阿里巴巴集团控股有限公司 Data quality detection method and apparatus
CN108389621A (en) * 2018-02-08 2018-08-10 山东康网网络科技有限公司 Medical record database quality determining method and system
CN110737650A (en) * 2019-09-27 2020-01-31 北京明略软件系统有限公司 Data quality detection method and device
CN112559523A (en) * 2020-12-11 2021-03-26 北京锐安科技有限公司 Data detection method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116131860A (en) * 2022-12-28 2023-05-16 山东华科信息技术有限公司 Data compression system and data compression method for distributed energy grid-connected monitoring
CN116131860B (en) * 2022-12-28 2023-09-05 山东华科信息技术有限公司 Data compression system and data compression method for distributed energy grid-connected monitoring

Also Published As

Publication number Publication date
CN112559523A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
WO2022121337A1 (en) Data exploration method and apparatus, and electronic device and storage medium
CN110968985B (en) Method and device for determining integrated circuit repair algorithm, storage medium and electronic equipment
CN111343161B (en) Abnormal information processing node analysis method, abnormal information processing node analysis device, abnormal information processing node analysis medium and electronic equipment
CN112613584A (en) Fault diagnosis method, device, equipment and storage medium
WO2022068316A1 (en) Data reconciliation method and apparatus, device, and storage medium
CN110647447A (en) Abnormal instance detection method, apparatus, device and medium for distributed system
CN110647523B (en) Data quality analysis method and device, storage medium and electronic equipment
CN111523764B (en) Service architecture detection method, device, tool, electronic equipment and medium
CN117593115A (en) Feature value determining method, device, equipment and medium of credit risk assessment model
CN115022201B (en) Data processing function test method, device, equipment and storage medium
WO2022062834A1 (en) Data exploration method and apparatus, electronic device and storage medium
CN113792138B (en) Report generation method and device, electronic equipment and storage medium
CN116185393A (en) Method, device, equipment, medium and product for generating interface document
CN113238940B (en) Interface test result comparison method, device, equipment and storage medium
CN114942905A (en) Migration data verification method, device, equipment and storage medium
US11520831B2 (en) Accuracy metric for regular expression
CN111427874B (en) Quality control method and device for medical data production and electronic equipment
CN112214469A (en) Drive test data processing method, device, server and storage medium
CN113656391A (en) Data detection method and device, storage medium and electronic equipment
CN110866557B (en) Data evaluation method and device, storage medium and electronic device
CN115292146B (en) System capacity estimation method, system, equipment and storage medium
CN117873860A (en) Data automatic testing method and device, electronic equipment and storage medium
CN115599681A (en) Interface test method, device, equipment and storage medium
CN117439928A (en) Link testing method and device of service system, electronic equipment and storage medium
CN117093494A (en) Test processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902047

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21902047

Country of ref document: EP

Kind code of ref document: A1