CN115510102A - Data analysis rule generation method and device based on data architecture - Google Patents

Data analysis rule generation method and device based on data architecture Download PDF

Info

Publication number
CN115510102A
CN115510102A CN202211142707.7A CN202211142707A CN115510102A CN 115510102 A CN115510102 A CN 115510102A CN 202211142707 A CN202211142707 A CN 202211142707A CN 115510102 A CN115510102 A CN 115510102A
Authority
CN
China
Prior art keywords
data
field
automatically
data model
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211142707.7A
Other languages
Chinese (zh)
Inventor
刘晨
孙星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Encore Beijing Information Technology Co ltd
Original Assignee
Encore Beijing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Encore Beijing Information Technology Co ltd filed Critical Encore Beijing Information Technology Co ltd
Priority to CN202211142707.7A priority Critical patent/CN115510102A/en
Publication of CN115510102A publication Critical patent/CN115510102A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the specification provides a data parsing rule generation method and device based on a data architecture, wherein the method comprises the following steps: establishing a data model based on a data architecture tool; acquiring data model information of the data model, identifying a use scene and a management theme of the data model based on the data model information, automatically creating different management scenes, and automatically matching different business rule templates; and automatically identifying the constraint conditions of the fields in the data model, automatically generating different analysis rules according to different constraint conditions, and generating an analysis result report.

Description

Data analysis rule generation method and device based on data architecture
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data parsing rule generating method and device based on a data architecture.
Background
Data Profiling (Data Profiling) is a form of Data analysis used to examine Data and evaluate quality. Data profiling uses statistical techniques to discover the true structure, content, and quality of a data set (Olson, 2003). The profiling engine generates statistical information that an analyst is a pattern in that can use to identify data content and structure. For example:
1) The number of null values. It is identified that a null value exists and it is checked whether a null value is allowed.
2) Max/min values. Outliers, such as negative values, are identified.
3) Maximum/minimum length. An outlier or invalid value for a field having a particular length requirement is determined.
4) Frequency distribution of individual column values. The rationality can be evaluated (e.g., country code distribution of transactions, checking for frequently or infrequently occurring values, and percentage of records populated with default values).
5) Data type and format. Identifying levels that do not meet format requirements, and unexpected format identification (e.g., decimal places, embedded spaces, sample values).
Profiling also includes cross-column analysis, which can identify overlapping or duplicate columns and expose intrinsic dependencies of values. The inter-table analysis explores overlapping value sets and helps identify foreign key relationships. Most data analysis tools allow for deep analysis of data for further investigation.
The analyst must evaluate the results of the profiling engine to determine if the data meets the rules and other requirements. A good analyst may use the analysis results to validate known relationships and discover hidden features and patterns within and between data sets, including business rules and validity constraints. Profiling is typically used as part of data discovery in projects (particularly data integration projects) or to assess the current state of data to be improved. The data profiling results can be used to identify opportunities that can improve the quality of data and metadata (0 lson, 2003.
While profiling is an effective way to understand data, it is only the first step in improving the quality of data, which enables an organization to identify potential problems. Other forms of analysis are also needed to solve the problem, including business process analysis, data margin analysis, and more in-depth data analysis, which help isolate the root cause of the problem.
With the gradual popularization of enterprise data transformation, more and more systems are built in enterprises, and the data volume generated by the systems is larger and larger, so that the quality of mass data is guaranteed, the true value of the data is exerted, and the mass data is generally concerned by people.
One of the key technologies for improving data quality is to check data periodically according to a predetermined parsing rule. And manually combing according to the checking result or report, screening problem data, and rectifying.
The generation of the analysis rule mainly depends on a system builder or related technical personnel at present, and manual compiling is performed according to historical experience and the knowledge of system data and service indexes, so that the process not only needs a large amount of labor and time cost, but also is easy to miss and make mistakes in the compiling process, and the data quality monitoring is not comprehensive. In addition, the result checked according to the rule is not necessarily fed back to the relevant personnel timely and effectively, so that the same quality problem may occur repeatedly.
Disclosure of Invention
The present invention provides a data parsing rule generating method and device based on a data architecture, and aims to solve the above problems in the prior art.
The invention provides a data analysis rule generation method based on a data architecture, which comprises the following steps:
establishing a data model based on a data architecture tool;
acquiring data model information of the data model, identifying a use scene and a management theme of the data model based on the data model information, automatically creating different management scenes, and automatically matching different business rule templates;
and automatically identifying the constraint conditions of the fields in the data model, automatically generating different analysis rules according to different constraint conditions, and generating an analysis result report.
The invention provides a data analysis rule generating device based on a data architecture, which comprises:
the establishing module is used for establishing a data model based on the data architecture tool;
the matching module is used for acquiring data model information of the data model, identifying a use scene and a management theme of the data model based on the data model information, automatically creating different management scenes and automatically matching different business rule templates;
and the generation module is used for automatically identifying the constraint conditions of the fields in the data model, automatically generating different analysis rules according to different constraint conditions and generating an analysis result report.
An embodiment of the present invention further provides an electronic device, including: the data analysis system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of the data analysis rule generation method based on the data architecture when being executed by the processor.
The embodiment of the present invention further provides a computer-readable storage medium, where an implementation program for information transfer is stored, and when the program is executed by a processor, the method implements the steps of the data parsing rule generation method based on the data architecture.
By adopting the embodiment of the invention, the effects of reducing labor and time cost and improving the checking coverage rate and accuracy rate can be achieved.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and that other drawings can be obtained by those skilled in the art without inventive exercise.
FIG. 1 is a flow chart of a data profiling rule generation method based on a data architecture according to an embodiment of the present invention;
FIG. 2 is a flowchart of a detailed process of a data profiling rule generation method based on a data architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data profiling rule generating apparatus based on a data architecture according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an electronic device of an embodiment of the invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from one or more of the embodiments described herein without making any inventive step shall fall within the scope of protection of this document.
Method embodiment
According to an embodiment of the present invention, a data parsing rule generating method based on a data architecture is provided, fig. 1 is a flowchart of the data parsing rule generating method based on the data architecture according to the embodiment of the present invention, and as shown in fig. 1, the data parsing rule generating method based on the data architecture according to the embodiment of the present invention specifically includes:
step S101, establishing a data model based on a data architecture tool; the data architecture tool is as follows: and the data modeling tool is used for assisting professional modelers to draw a data logic model and a physical model and can generate DDL statements.
Step S102, acquiring data model information of the data model, identifying a use scene and a management theme of the data model based on the data model information, automatically creating different management scenes, and automatically matching different business rule templates;
and step S103, automatically identifying constraint conditions of fields in the data model, automatically generating different analysis rules according to different constraint conditions, and generating an analysis result report. Step S103 specifically includes: when the field is a primary key field, automatically generating a non-null and unique value parsing rule, and when the primary key field is an attribute, only checking whether a repeated value exists in the field and whether a null value exists; when the primary key field is a plurality of attributes, simultaneously judging whether a plurality of fields have repeated values and whether null values simultaneously exist in the fields; when the field is a non-null field, automatically generating an analysis rule of a null value check class, namely checking whether the field is null or not; and when the field is a relation/foreign key, automatically generating a correlation analysis rule, and generating correlation verification through a dependency relationship established by the relation.
After step S103, the profiling results report is displayed back into the data architecture tool.
The above-described technical means of the embodiments of the present invention will be described in detail below.
As shown in fig. 2, the method specifically includes the following steps:
step 1, establishing a data model by means of a data architecture tool. The data architecture tool can be understood as a data modeling tool and a software tool which is used for assisting professional modeling personnel to draw a data logic model and a physical model and can generate DDL statements. The modeling tools that are currently used in the industry are: powerDesigner, ERWin, domestic relatively mature also have: datablau-DDM, weavertir, and the like.
And 2, acquiring data model information, identifying the use scene and the management theme of the data model, automatically creating different management scenes, and automatically matching different business rule templates. For example, a data model governing the submission topic-related system is initialized, and business rules are configured based on the model.
And 3, automatically identifying constraint conditions of fields in the model, and automatically generating different analysis rules according to different constraint conditions, such as:
1. "Primary Key field": a profiling rule is automatically generated for non-null and unique values.
When the primary key field is an attribute, it is checked only whether there is a duplicate value in the field and whether there is a null value.
When the primary key field is a plurality of attributes, it is necessary to simultaneously determine whether duplicate values exist in the plurality of fields and whether null values exist in the plurality of fields at the same time.
"non-empty field": and automatically generating the parsing rule of the null value check class. I.e. check if the field is empty (no data entered).
"relationship/foreign bond": and automatically generating the associated parsing rule. And generating relevance verification through the dependency relationship established by the relationship. For example, if there is an order form and a customer form, the order form has customer information, and at this time, the customer information in the order form needs to be checked to see if it exists in the customer form.
And 4, finally, displaying the analysis result report back to the data architecture tool for reference of data architecture personnel.
According to the embodiment of the invention, the data architecture information is acquired through the data architecture tool, the constraint conditions of the fields in the data architecture are identified, and the data analysis rule can be automatically and efficiently generated according to different scenes and the constraint conditions, so that the aims of reducing labor and time costs and improving the coverage rate and accuracy rate are fulfilled.
Apparatus embodiment one
According to an embodiment of the present invention, a data parsing rule generating device based on a data architecture is provided, fig. 3 is a schematic diagram of the data parsing rule generating device based on a data architecture according to the embodiment of the present invention, as shown in fig. 3, the data parsing rule generating device based on a data architecture according to the embodiment of the present invention specifically includes:
an establishing module 30 for establishing a data model based on the data architecture tool; the data architecture tool is as follows: the data modeling tool is used for assisting professional modeling personnel to draw a data logic model and a physical model and can generate DDL statements.
The matching module 32 is configured to obtain data model information of the data model, identify a usage scenario and a management topic of the data model based on the data model information, automatically create different management scenarios, and automatically match different business rule templates;
and the generating module 34 is configured to automatically identify constraint conditions of fields in the data model, and automatically generate different parsing rules according to different constraint conditions to generate a parsing result report. The generating module 34 is specifically configured to:
when the field is a primary key field, automatically generating a non-null and unique value parsing rule, and when the primary key field is an attribute, only checking whether a repeated value exists in the field and whether a null value exists; when the primary key field is a plurality of attributes, simultaneously judging whether a plurality of fields have repeated values and whether null values simultaneously exist in the fields;
when the field is a non-null field, automatically generating a parsing rule of a null value check class, namely checking whether the field is null or not;
and when the field is a relation/foreign key, automatically generating a correlation analysis rule, and generating correlation verification through a dependency relationship established by the relation.
The above apparatus may further include: and the back display module is used for displaying the analysis result report back to the data architecture tool.
Compared with the prior art, the technical scheme provided by the invention automatically generates the data parsing rule according to the data architecture. After the technology is adopted, the labor cost and the time cost are obviously reduced, and the coverage rate and the accuracy rate of the check are obviously improved. In practical use, the invention has low learning cost and obviously improved efficiency, and meets the application requirement.
The embodiment of the present invention is an apparatus embodiment corresponding to the above method embodiment, and specific operations of each module may be understood with reference to the description of the method embodiment, which is not described herein again.
Device embodiment II
An embodiment of the present invention provides an electronic device, as shown in fig. 4, including: a memory 40, a processor 42 and a computer program stored on the memory 40 and executable on the processor 42, which computer program when executed by the processor 42 performs the steps as described in the method embodiments.
Device embodiment III
An embodiment of the present invention provides a computer-readable storage medium, on which an implementation program for information transmission is stored, and when executed by the processor 42, the program implements the steps as described in the method embodiment.
The computer-readable storage medium of this embodiment includes, but is not limited to: ROM, RAM, magnetic or optical disks, and the like.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A data analysis rule generation method based on a data architecture is characterized by comprising the following steps:
establishing a data model based on a data architecture tool;
acquiring data model information of the data model, identifying a use scene and a management theme of the data model based on the data model information, automatically creating different management scenes, and automatically matching different business rule templates;
and automatically identifying the constraint conditions of the fields in the data model, automatically generating different analysis rules according to different constraint conditions, and generating an analysis result report.
2. The method of claim 1, further comprising:
displaying the profiling results report back into the data architecture tool.
3. The method of claim 1, wherein the data structuring tool is: the data modeling tool is used for assisting professional modeling personnel to draw a data logic model and a physical model and can generate DDL statements.
4. The method of claim 1, wherein automatically identifying constraints for fields in the data model and automatically generating different parsing rules based on different constraints specifically comprises:
when the field is a primary key field, automatically generating a non-null and unique value parsing rule, and when the primary key field is an attribute, only checking whether a repeated value exists in the field and whether a null value exists; when the primary key field is a plurality of attributes, simultaneously judging whether a plurality of fields have repeated values and whether null values simultaneously exist in the fields;
when the field is a non-null field, automatically generating a parsing rule of a null value check class, namely checking whether the field is null or not;
and when the field is a relation/foreign key, automatically generating a correlation analysis rule, and generating correlation verification through a dependency relation established by the relation.
5. A data analysis rule generation device based on a data architecture is characterized by comprising:
the establishing module is used for establishing a data model based on the data architecture tool;
the matching module is used for acquiring data model information of the data model, identifying a use scene and a management theme of the data model based on the data model information, automatically creating different management scenes and automatically matching different business rule templates;
and the generation module is used for automatically identifying the constraint conditions of the fields in the data model, automatically generating different analysis rules according to different constraint conditions and generating an analysis result report.
6. The apparatus of claim 5, further comprising:
and the back display module is used for displaying the analysis result report back to the data architecture tool.
7. The apparatus of claim 5, wherein the data structuring tool is: the data modeling tool is used for assisting professional modeling personnel to draw a data logic model and a physical model and can generate DDL statements.
8. The apparatus of claim 5, wherein the generation module is specifically configured to:
when the field is a primary key field, automatically generating a non-null and unique value parsing rule, and when the primary key field is an attribute, only checking whether a repeated value exists in the field and whether a null value exists; when the primary key field is a plurality of attributes, simultaneously judging whether a plurality of fields have repeated values and whether null values simultaneously exist in the fields;
when the field is a non-null field, automatically generating an analysis rule of a null value check class, namely checking whether the field is null or not;
and when the field is a relation/foreign key, automatically generating a correlation analysis rule, and generating correlation verification through a dependency relation established by the relation.
9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the data architecture based data profiling rule generating method according to any of claims 1 to 4.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores thereon an implementation program of information transfer, which when executed by a processor implements the steps of the data profiling rule generating method based on data architecture of any claim 1 to 4.
CN202211142707.7A 2022-09-20 2022-09-20 Data analysis rule generation method and device based on data architecture Pending CN115510102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211142707.7A CN115510102A (en) 2022-09-20 2022-09-20 Data analysis rule generation method and device based on data architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211142707.7A CN115510102A (en) 2022-09-20 2022-09-20 Data analysis rule generation method and device based on data architecture

Publications (1)

Publication Number Publication Date
CN115510102A true CN115510102A (en) 2022-12-23

Family

ID=84503882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211142707.7A Pending CN115510102A (en) 2022-09-20 2022-09-20 Data analysis rule generation method and device based on data architecture

Country Status (1)

Country Link
CN (1) CN115510102A (en)

Similar Documents

Publication Publication Date Title
CN104391934B (en) Data verification method and device
US7818342B2 (en) Tracking usage of data elements in electronic business communications
US7711676B2 (en) Tracking usage of data elements in electronic business communications
US8768976B2 (en) Operational-related data computation engine
CN110457294B (en) Data processing method and device
CN104679646B (en) A kind of method and apparatus for detecting SQL code defect
CN106293891B (en) Multidimensional investment index monitoring method
US6959429B1 (en) System for developing data collection software applications
CN110543303A (en) Visual business platform
EP3783522A1 (en) Semantic model instantiation method, system and device
CN112883042A (en) Data updating and displaying method and device, electronic equipment and storage medium
CN117033460B (en) Automatic data model construction system and method based on bus matrix
CN109753490A (en) Database optimizing method, system, equipment and medium based on loophole reparation
CN117454278A (en) Method and system for realizing digital rule engine of standard enterprise
CN110737432A (en) script aided design method and device based on root list
CN114722789B (en) Data report integrating method, device, electronic equipment and storage medium
CN108427675A (en) Build the method and apparatus of index
CN116450723A (en) Data extraction method, device, computer equipment and storage medium
CN115510102A (en) Data analysis rule generation method and device based on data architecture
JP2004021411A (en) Access right supervising device and access right management program
CN115062023A (en) Wide table optimization method and device, electronic equipment and computer readable storage medium
CN115168474A (en) Internet of things center station system building method based on big data model
CN114860759A (en) Data processing method, device and equipment and readable storage medium
US11810022B2 (en) Contact center call volume prediction
CN111143328A (en) Agile business intelligent data construction method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination